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Coronavirus 
misinformation 
needs engagement 


Researchers must be open and transparent — 
acknowledge what is known and what isn’t. 


he past few weeks have seen an explosion in 

misleading claims about COVID-19. These are 

mostly online, and many are intended to sow 

doubts about vaccination as a way to protect 

against infection. For the individuals and organ- 
izations involved in such disinformation, the pandemic is 
a gilded opportunity. They are capitalizing on both the 
many unknowns about the SARS-CoV-2 virus and the dis- 
ease it causes, as well as the many legitimate questions 
about safety and efficacy as vaccines are being developed 
at unprecedented speed. 

Vaccines must be safe and effective. Once (and only 
once) this is proven, immunization campaigns need to be 
comprehensive to succeed. But this presents many chal- 
lenges. For low-income countries, and in those without uni- 
versal health care, a key obstacle is ensuring that vaccines 
are available and affordable. For certain higher-income 
countries — for example some in Europe — the challenge for 
coronavirus will be to overcome scepticism about vaccines, 
which is being fuelled by false information. 

Researchers can play a part. Knowing what to do in the 
middle of a pandemic isn’t straightforward. But for those 
considering how to respond to the kinds of questions that 
everyone is asking, and what to do about disinformation, 
there are ways to help. 


Tackling disinformation 


As Nature reports on page 371, misinformation (false 
information) and disinformation (information that is 
deliberately misleading) are complex. Some politicians 
are spreading virus disinformation to burnish their image 
and influence among their supporters. There are organiza- 
tions that have set up disinformation websites — including 
money-making scams. Very little, ifany, of this information 
will have been put through an open process of verification 
and review. For consumers, it can be a double whammy — 
they are paying, and also being misinformed or misled. 
Public-health agencies and technology firms are aware 
of the harm being done and are working to respond. To 
their credit, platforms such as Facebook and YouTube are 
more active in taking down posts where there is a clear risk 
to public health. When questions such as “are vaccines 
safe” are typed into Google, the search algorithms are 
listing sources that provide evidence-based information. 
But for every item of misinformation and disinformation 
that are dealt with, more pop up. Moreover, sites have 
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Involving 
communities 
helps to 
create and 
maintain 
trust.” 


discovered ways to circumvent artificial-intelligence tools 
and harried moderators, and that makes the role of human 
fact-checkers more important. 

One thing that researchers can do is to work with 
organizations that are responding to disinformation. 
They can support or join in the work of professional 
fact-checkers, journalists and academics, doggedly fol- 
lowing bots and disinformation-news sites, flagging their 
content to the media organizations and social-media firms 
that host these sites. Groups all over the world are involved 
in this response — including professional bodies, learned 
societies and media-facing organizations. The work they 
do is labour-intensive and can seem never-ending, but it 
is needed now more than ever. 


Public engagement and transparency 


Many people are asking important questions on subjects 
such as the safety of proposed vaccines, the security of 
contact-tracing apps and how intellectual property rights 
and profits from new drugs and vaccines will be shared. 
These are questions that researchers from fields such 
as public health, data security and health-care finance 
are also asking. If they are not already doing so, now is 
the time for these and other researchers to expand their 
public engagement. 

It might be that a definite answer isn’t known, or that 
there are arange of possible answers. That is often the case 
in science. The study and practice of public engagement 
in science has shown that involving communities in the 
kinds of conversations that researchers have — conversa- 
tions about how scientists search for evidence, and being 
transparent about what is known and not known — all helps 
to create and maintain trust. 

Ayear ago, the UK biomedical funding charity Wellcome 
published the results of a large global survey into vaccines, 
involving 140,000 participants in 140 countries. It found 
that around 80% of respondents considered vaccines 
safe and effective. Confidence was highest in low-income 
countries — notably Bangladesh and Rwanda — where 
public-awareness campaigns against infectious diseases 
such as malaria, typhoid and hepatitis are common. 

By contrast, confidence in the importance of vaccines 
was lower in Europe, where populations are compara- 
tively free of infectious diseases, but now have some of 
the highest deaths and infections from COVID-19. Some 
22% of respondents from Europe are not confident 
that vaccines are safe, and this figure increases to 33% 
for France. Wellcome’s findings reflect those from the 
European Commission’s own State of Vaccine Confidence 
report from 2018. Across the European Union, health 
ministries are unable to meet their own target — set after 
the 2009 HINI1 swine flu outbreak — of vaccinating 75% of 
over-65s against flu. 

Last November, Heidi Larson, an anthropologist at the 
London School of Hygiene and Tropical Medicine — and a 
co-author of the European Commission report — warnedin 
an interview with Nature that if there is “another very seri- 
ous influenza pandemic sooner or later, and if the public 
opt to forgo vaccination the way they did during the 2009 
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swine-flu pandemic, we're in deep trouble’ (S. el-Showk 
Nature 575, S57; 2019). 

That is why there is work to be done. When it comes to 
communicating emerging information on research, the 
lessons from studies and from past practice are clear: not 
to over-promise, nor oversell and to emphasize what is 
known and whatisn'’t. In the case of vaccines, it means being 
as transparent as possible about how vaccines are made, 
how they work, what they contain and how they will be 
tested, and always being upfront about the evidence for 
their effectiveness, possible risks and side effects. 

Researchers should play a part — no matter how small 
— inthe response to misinformation and disinformation. 
We need to build a society that is resilient to falsehoods 
about COVID-19, a task that will only become more vital 
as vaccines near. 


Milestone inhuman 
genetics highlights 
diversity gap 


Landmark study identifies genes that it seems 
people can and cannot live without. But such 
data still need to be more representative. 


rom the time that the nineteenth-century monk 

Gregor Mendel squinted at the pea plants in his 

garden and wondered why some had white flow- 

ers or wrinkled seeds, it has been a tradition in 

biology to observe what goes awry when a DNA 
sequence is altered — whether that variation occurs natu- 
rally or through human intervention. 

Although geneticists have long been able to introduce 
genetic mutations into model organisms such as the fruit 
fly — first with X-rays or chemicals, and now with more 
sophisticated gene-editing tools — where humans are con- 
cerned, the toolbox is more limited. Researchers clearly 
cannot intentionally introduce mutations into humans; 
instead, they must use what nature provides. As a result, 
they comb through genomes in search of variations in 
DNA sequences, and use statistical tools to determine 
whether those variations contribute to traits and diseases. 
As genome sequencing has become quicker and cheaper, 
those studies have become bigger and more complex. 

This week, three journals in the Nature family are publish- 
ing the results of the latest effort: a study of a staggering 
125,748 exomes (the part of the genome that codes for pro- 
teins) and 15,708 whole genomes (see go.nature.com/2zg- 
fxr2). The study — the most extensive publicly accessible 
analysis carried out so far — sheds light on which genes 
are essential and which a person might be able to live with- 
out. The results are compiled in the Genome Aggregation 
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Database (gnomAD) and will help researchers to better 
understand the roots of genetic disorders, and, eventu- 
ally, how best to treat them. That mutations can inactivate 
genes is hardly new, but this study adds to the surprisingly 
long list of mutations that can obliterate a gene’s function 
without causing obvious harm. The study also identified 
a flurry of genes that are probably vital for life, because 
people rarely harbour drastic mutations predicted to cause 
‘loss of function’ in these genes. 

The study’s large scale made it possible for the authors 
to devise a measure of how tolerant to loss-of-function 
mutations a given gene might be. This is a useful tool with 
which to study the function of known and newly identified 
genes, to pinpoint candidate disease-causing mutations, 
and to find new drug targets in the human genome. 

One example is the team’s evaluation of the gene LRRK2, 
which has been implicated in Parkinson’s disease (N. Whif- 
fin et al. Nature Med. https://doi.org/10.1038/s41591-020- 
0893-5; 2020). DNA variants that increase the activity of 
the LRRK2 protein have been associated with a higher risk 
of the disease, leading scientists to think that a drug that 
switches the gene off could be beneficial. But would turning 
off LRRK2, whichis active in the brain, as well as in other tis- 
sues, be dangerous? Looking through gnomAD’s 140,000 
genomes and exomes, the authors found many naturally 
occurring DNA sequence variants that switch off LRRK2. 
That suggests — at least in principle — that a drug that can 
mimic this effect might not be harmful. 

To answer such questions, a very large number of sam- 
ples is needed, in part because DNA sequence variations 
that wipe out the function of an important gene are likely 
to be rare. This means that the more genomes scientists 
can analyse, the more variants they can find and the bet- 
ter they can pick apart the effects of each one. But such 
projects also need a greater diversity of participants than 
they have had thus far. 

In the current studies, around half of the samples were 
donated by people of European descent. Although this is 
animprovement on previous studies, people from regions 
suchas Central Asia, Oceania, the Middle East and much of 
Africa are almost absent. This means researchers are prob- 
ably missing variants that are important for understanding 
gene function — and disease risk — in these regions. This 
is something that consortium members recognize, but 
progress is slow. Researchers and funders must incentivize 
such work to ensure that it continues to expand. 

The gnomAD database is an outstanding resource. The 
willingness of participants to contribute — along with the 
willingness of researchers to share — has been key to its suc- 
cess. Further insights will come from combining sequence 
data with clinical information. Projects such as the Estonian 
Biobank, which includes more than 200,000 participants, 
and the UK Biobank, which has DNA and healthinformation 
from 500,000 people, are paving the way. But such efforts 
need the involvement of more-diverse populations. 

With these improvements, researchers will be able to 
maximize the contribution of everyone who provided their 
DNA samples to improve our knowledge of human biology 
and to fully harness genetic differences to benefit us all. 
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BANK OF ENGLAND 


A personal take on science and society 


World view 


By Andy Haldane 


Model lives and 


livelihoods in lockstep 


Economists must improve tools to weigh 
trade-offs between health and wealth. 


ast month I lost a co-author and friend whose 

interdisciplinary work now seems chillingly 

prescient. Robert May, once chief scientific 

adviser to the UK government and president of 

the Royal Society, did ground-breaking work on 
disease contagion, among other things. 

A decade ago, May and I published a paper in Nature 
using models drawn from epidemiology to understand 
the dynamics of the global financial crisis of 2008-09, 
and appropriate policy responses to it. This instigated 
new thinking on the modelling of economies and finan- 
cial systems. It also helped to demonstrate the value of 
cross-disciplinary research involving economics. Neither 
had featured strongly before the crisis. 

The work made clear that economies and financial sectors, 
as connected social systems, have classic robust-yet-fragile 
properties (A. G. Haldane and R. M. May Nature 469, 351-355; 
2011). They are prone to periodic, self-generated tipping 
points, in which they shift quickly into a new state from 
which they can’t return. Avoiding these means strengthen- 
ing the systems’ resilience by building up buffers to cush- 
ion stress, particularly among the largest, most connected 
financial firms — super-spreaders, if you will. 

Those findings were the opposite of the orthodoxy 
pre-crisis, when super-spreader banks, and the financial 
system as a whole, ran with dangerously slender buffers 
of capital and liquidity. Encouragingly, inthe decade since 
we published our paper, financial policy has been radically 
reformed and significantly larger buffers have built up. 

Itis just as well. The COVID-19 crisis has again exposed the 
fragilities of social and economic systems and howthey can 
operate ona knife edge. This time, the source of the threat 
is public health rather than financial wealth, but again the 
risk is systemic and chronic. Public-health concerns have, 
rightly, taken priority in the setting of policy. 

But the social-distancing measures put in place around 
the world to limit viral transmission have come at a sig- 
nificant economic and financial cost. As businesses and 
households have locked down, there has been a collapse in 
global activity and spending, unprecedented in its speed 
and severity. This has prompted a similarly unprecedented 
loosening of fiscal and monetary policies. 

Public policy remains onthe horns ofa dilemmain many 
countries. It is navigating a narrow path that can seem to 
pit livelihoods against lives, the needs of older genera- 
tions against those of younger ones, the health benefits 
of physical distance against the social benefits of societal 
cohesion. These are trade-offs the like and extent of which 


44 


The 
COVID-19 
crisis has 
exposed the 
fragilities of 
social and 
economic 
systems.’ 


Andy Haldane is 
chief economist of 
the Bank of England 
in London. 

e-mail: andy. 
haldane@ 
bankofengland.co.uk 


policymakers have never seen. 

Itis here that economists have an important role: we have 
always articulated and calibrated such trade-offs. This has 
given economists influence over some of the most difficult 
choices facing policymakers in recent decades, including 
howto tackle the other existential crisis — climate change. 

Encouragingly, economists have quickly risen to the 
challenge. As one example, the Centre for Economic Policy 
Research in London is now publishing articles several times 
each week, gathering together economic research on the 
financial and social impact of COVID-19. 

These insights have often come from combining model- 
ling approaches from the natural and social sciences. For 
example, embedding an SIR (Susceptible, Infected, Recov- 
ered) model of disease dynamics in a general equilibrium 
model of people’s spending decisions allows us to capture 
and calibrate some of the difficult trade-offs. 

Despite this rapid progress, these models are still too 
fledgling and crude to provide robust advice to policy- 
makers weighing economic and health outcomes — for 
example, deciding when and how best to ease social 
distancing. One of the most pressing analytical challenges 
ahead, then, is to advance these models. 

After the 2008-09 financial crisis, reform focused on 
protecting those financial-sector activities that were 
most crucial to the public. These were not the high-risk, 
high-return activities that sowed the seeds of the crash. 
They were the everyday essentials of banking — the making 
of payments and loans to individuals and businesses. 
Post-crisis, these activities were ring-fenced to add resil- 
ience and useful redundancy to the financial system. 

After the current crisis, we must ask questions about the 
resilience of our health- and social-care sectors and of the 
economy generally. How can we ensure that the economy 
can produce enough kit and people to meet the needs of 
health- and social-care systems? What other activities need 
to be ring-fenced to ensure that they are resilient to future 
extreme events, from viruses to cyberattacks and cyclones? 

To answer these questions, we need highly granular data 
embedded in high-dimensional models of many interacting 
agents. Such agent-based models have been used exten- 
sively in the natural sciences, to study everything from 
ecosystems to galaxies. They have been used much less 
in understanding our economies. 

Developing these models will not be easy. It calls for a 
concerted, bridge-building effort involving statisticians, 
physicists, epidemiologists, meteorologists, sociologists 
and economists. Intellectual cross-pollination of the 
kind that I (a policymaker) enjoyed with May (an ecolo- 
gist) could, in time, help to contain the sorts of viral and 
economic contagion that are imposing such high costs on 
the world today. 
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The world this week 


Newsin brief 


ARCHAEOLOGICAL WORKHORSE 
GETS MAJOR REBOOT 


Radiocarbon dating — a key 
tool for determining the age of 
prehistoric samples — is due to 
be recalibrated using a slew of 
new data from around the world. 
The work combines thousands 
of data points from tree rings, 
lake and ocean sediments, corals 
and stalagmites, among other 
features, and means that the 
technique can be used to judge 
dates back to 55,000 years ago — 
5,000 years further than the last 
calibration update, in 2013. 

Archaeologists are 
downright giddy. “Maybe I’ve 
been in lockdown too long,” 
tweeted Nicholas Sutton, an 
archaeologist at the University 
of Otago in Dunedin, New 
Zealand, “but... I’m really 
excited about it!” 

The basis of radiocarbon 
dating is simple: all living 
things absorb carbon from the 
atmosphere and food sources 
around them, including a certain 
amount of natural, radioactive 
carbon-14. When the plant or 
animal dies, they stop absorbing, 
and the radioactive carbon that 
they’ve accumulated continues 
to decay. Measuring the amount 
left over gives an estimate as to 
how long something has been 
dead. 


But this basic calculation 
assumes that the amount of 
carbon-14 in the environment 
has been constant in time 
and space — which it hasn't. In 
recent decades, the burning of 
fossil fuels and tests of nuclear 
bombs have radically altered 
the amount of carbon-14 in 
the air, and there are non- 
anthropogenic wobbles going 
much further back. 

As aresult, researchers have 
created conversion tables that 
match calendar dates with 
radiocarbon dates for different 
times and regions. Scientists 
are releasing new curves for the 
Northern Hemisphere (a model 
called IntCal20), Southern 
Hemisphere (SHCal20) and 
marine samples (MarineCal20). 
They will be published in the 
journal Radiocarbon inthe next 
few months. 

Tom Higham, an 
archaeological chronologist 
and director of the Oxford 
Radiocarbon Accelerator Unit, 
UK, says that recalibration is 
fundamental for understanding 
the chronology of hominins 
living 40,000 years ago, among 
other things. “Iam really excited 
about calibrating our latest data 
using this curve.” 


VIRTUAL CONFERENCES FIND 
FAVOUR WITH SCIENTISTS 


More than 80% of respondents 
toa Nature reader poll said 
that they would be in favour of 
some scientific conferences 
remaining virtual even after 
the coronavirus pandemic 
ends. Many meetings have been 
pushed online since March as 
aresult of the global COVID-19 
outbreak — including large, 
flagship conferences that 
usually attract thousands of 
attendees. 

More than 40% of the roughly 
500 survey respondents said 
that they had attended an online 
meeting (see ‘Virtual reality’). 

Readers lauded some 
aspects of virtual meetings 
— in particular, improved 
accessibility, low costs and 
avoiding the hassle of travel. 

“I know colleagues around the 
world with limited budgets 
who’ve also suddenly been 
able to attend many more 
meetings,’ says Tom Brown, 
who studies energy-system 
modelling at Karlsruhe Institute 
of Technology in Germany. 
Some respondents found 
presentations to be clearer, 
liked that they could rewatch 
recorded talks, and felt that 

it is easier to speak up as an 
audience member using digital 
tools than it is ina meeting hall. 


Downsides included clunky 
technology, connection issues 
and, most notably, the lack 
of serendipitous encounters, 
human interaction and 
socializing. 

But many respondents 
thought that virtual conferences 
would improve in time and that 
the digital experience could 
help to make in-person meetings 
better, for instance through 
improving technologies that 
allow attendees to exchange 
data, knowledge and opinions 
during presentations. 

Some advocated a hybrid 
model — a face-to-face meeting 
with increased virtual elements. 
“Some of the best parts of a 
conference, suchas informal 
meetings, local flavour and 
more time to talk to speakers 
after sessions were completely 
lost. A hybrid system might 
recover some of these benefits,” 
says Paul DeStefano, a physicist 
at Portland State University in 
Oregon. 


VIRTUAL REALITY 


In a Nature online poll, more than 40% of respondents said they had attended a 
scientific meeting held online because of the coronavirus pandemic. About 80% 
said some conferences should remain virtual even after the pandemic is over. 


Have you attended a conference that was run virtually 
as aresult of the pandemic? (499 respondents) 


Yes 41% 


No, but | plan 
to 27% 


Do you think some meetings should continue to 
be virtual after the pandemic? (486 respondents) 


Yes 81% 


Nature online poll run 20 April - 4 May 2020. 


No 19% 
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The world this week 


News in focus 


Vaccines against the coronavirus are being tested in humans and animals. 


CORONAVIRUS VACCINE TRIALS HAVE 
DELIVERED THEIR FIRST RESULTS — 
BUT THEIR PROMISE IS STILL UNCLEAR 


Scientists urge caution over hints of success 
emerging from early human and animal studies. 


By Ewen Callaway 


s coronavirus vaccines hurtle through 
development, scientists are getting 

their first look at data that hint at 

how well different vaccines are likely 

to work. The picture, so far, is murky. 

On 18 May, US biotech firm Moderna 
revealed the first data from a human trial: 
its COVID-19 vaccine triggered an immune 
response in people, and protected mice 
from lung infections with the corona- 
virus SARS-CoV-2. The results — which the 


company, based in Cambridge, Massachusetts, 
announced ina press release — were widely 
interpreted as positive and sent share prices 
surging. But some scientists say that, because 
the data haven't been published, they lack the 
details needed to properly evaluate those 
claims. 

Tests of other fast-tracked vaccines show 
that they have prevented infections in the 
lungs of monkeys exposed to SARS-CoV-2 — 
but not in some other parts of the body. One 
—a vaccine being developed at the Univer- 
sity of Oxford, UK, that is also in human trials 
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— protected six monkeys from pneumonia, 
but the animals’ noses harboured as much 
virus as did those of unvaccinated monkeys, 
researchers reported’ last week in a bioRxiv 
preprint. A Chinese group reported similar 
caveats about its own vaccine’s early animal 
tests this month’. 

Despite the uncertainties, all three teams 
are pressing ahead with clinical trials. These 
early studies in people are meant mainly to 
test safety, but larger clinical trials designed to 
determine whether the vaccines can actually 
protect humans from COVID-19 could report 
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in the next few months. 

Still, the early data together offer clues as 
to how coronavirus vaccines might generate 
a strong immune response. Scientists say that 
animal data will be crucial for understanding 
how coronavirus vaccines work, so that the 
most promising candidates can be identified 
quickly and then refined. 

“We might have vaccines in the clinic that are 
useful in people within 12 or 18 months,” says 
Dave O’Connor, a vvirologist at the University of 
Wisconsin—Madison. “But we’re going to need 
to improve on them to develop second- and 
third-generation vaccines.” 


Immune response 


Moderna’s vaccine, which is being co-developed 
with the US National Institute of Allergy and 
Infectious Diseases (NIAID) in Bethesda, Mary- 
land, began safety testing in humans in March. 
The vaccine consists of messenger RNA instruc- 
tions for building the coronavirus’s spike 
protein; it causes human cells to churn out the 
foreign protein, alerting the immune system. 
Although such RNA-based vaccines are easy to 
develop, none has ever been licensed. 

In its press release, the company reported 
that 45 study participants who received one or 
two doses of the vaccine developed a strong 
immune response to the virus. Researchers 
measured virus-recognizing antibodies in 
25 participants, and detected levels similar 
to or higher than those found in the blood of 
people who have recovered from COVID-19. 

Tal Zaks, Moderna’s chief medical officer, 
said in a presentation to investors that these 
antibody levels bode well for the vaccine 
preventing infection. “If you get to the level 
of people who had disease, that should be 
enough.” 

But it’s not at all clear whether the responses 
are enough to protect people from infection, 
because Moderna hasn'’t shared its data, says 
Peter Hotez, a vaccine scientist at Baylor 
College of Medicine in Houston, Texas. “I’m 
not convinced that this is really a positive 
result,” Hotez says. He points to a 15 May 
bioRxiv preprint? that found that most people 
who have recovered from COVID-19 without 
hospitalization do not produce high levels of 
‘neutralizing antibodies’, which block the virus 
from infecting cells. Moderna measured these 
potent antibodies in eight trial participants 
and found their levels to be similar to those 
in recovered patients. 

Hotez also has doubts about the Oxford 
team’s first results, which found that monkeys 
produced modest levels of neutralizing anti- 
bodies after receiving one dose of the vaccine 
(the same regime that is being tested inhuman 
trials). “It looks like those numbers need to be 
considerably higher to afford protection,’ says 
Hotez. The vaccine is made from a chimpan- 
zee virus that has been genetically altered to 
produce a coronavirus protein. 
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Rhesus macaques are used to test vaccines. 


Hotez says that the vaccine being devel- 
oped by Sinovac Biotech in Beijing seems 
to have elicited a more promising antibody 
response in macaque monkeys that received 
three doses, as reported? in a5 May paper in 
Science. That vaccine is made of chemically 
inactivated SARS-CoV-2 particles. 

No one yet knows the precise nature of the 
immune response that protects people from 
COVID-19, and the levels of neutralizing anti- 
bodies made by the monkeys in the Oxford 
study might be enough to protect people from 
infection, says Michael Diamond, a viral immu- 
nologist at Washington University in St. Louis, 
Missouri, who is amember of Moderna’s scien- 
tific advisory board. If not, asecond injection 


“You wanta vaccine that 
would protect against 
disease and against 
transmission.” 


would probably boost levels. “What we don’t 
know is how long they'll last,” he adds. 

Still more questions hover over experiments 
showing that vaccines can protect animals from 
infection. Moderna said its vaccine stopped 
the virus replicating in the lungs of mice. The 
rodents had been infected with a version of 
the virus that was genetically modified to let 
it attack mouse cells, which are not ordinarily 
susceptible to SARS-CoV-2, according to Zaks’s 
presentation. But the mutation affects the pro- 
tein that most vaccines, including Moderna’s, 
use to stimulate the immune system; this could 
change the animals’ response to infection’. 

The Oxford monkeys were givena high dose 
of virus after receiving the vaccine, says Sarah 
Gilbert, an Oxford vaccinologist who co-led 
the study with Vincent Munster, a virologist 
at NIAID’s laboratories in Hamilton, Mon- 
tana. This could explain why the vaccinated 
animals had just as much SARS-CoV-2 genetic 
material in their noses as did control animals, 
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even though the vaccinated monkeys didn’t 
develop any signs of pneumonia. Administer- 
ing high doses ensures that the animals are 
infected with the virus, but it might not repli- 
cate natural infections. The Oxford study did 
not measure whether the virus was still infec- 
tious, Diamond says, and the genetic material 
could represent virus particles inactivated by 
the monkeys’ immune response, or the viruses 
the researchers administered, rather than an 
ongoing infection. 

Still, the result raises the possibility that 
vaccinated people could still spread the virus, 
says Douglas Reed, an aerobiologist at the 
University of Pittsburgh Center for Vaccine 
Research in Pennsylvania. “Ideally, you want 
a vaccine that would protect against disease 
and against transmission,” he says. 


Safety signs 

Although assessing a vaccine’s potential 
efficacy is difficult, the latest data are clearer 
on safety, say researchers. The Moderna vac- 
cine caused few severe and no lasting health 
problems in trial participants. The vaccinated 
Oxford and Sinovac monkeys did not develop 
an exacerbated disease after infection — a key 
fear, because an inactivated vaccine for the 
related coronavirus that causes SARS (severe 
acute respiratory syndrome) showed signs of 
this in macaques>. 

None of the data should dissuade devel- 
opers from doing human trials to determine 
whether the vaccines work, says Stanley 
Perlman, a coronavirologist at the University 
of lowa in lowa City. 

Moderna will soon begin a phase II trial 
involving 600 participants. It hopes to begin 
a larger, phase III, efficacy trial in July, to test 
whether the vaccine can prevent disease in 
high-risk groups, such as health-care workers 
and people with underlying medical problems. 
Zaks said that further animal studies, includ- 
ing somein monkeys, were under way, and that 
it wasn’t yet clear which animal would best 
predict whether and how the vaccine works. 

The Oxford team has already enrolled 
more than 1,000 people in its UK trial. Some 
volunteers have received a placebo, so trials 
could determine whether the vaccine works 
in humans over the coming months. The lack 
of safety problems in the team’s monkey study 
was reassuring, Gilbert says. 

“We don't really need any more data from 
animal trials to continue,” she says. “If we get 
human efficacy, we’ve got human efficacy, and 
that’s what matters.” 
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ARE WOMEN PUBLISHING 
LESS DURING THE PANDEMIC? 
HERE'S WHAT THE DATA SAY 


Early analyses suggest female academics are posting 
fewer preprints than men, and starting fewer projects. 


By Giuliana Viglione 


uarantined with a six-year-old child 

underfoot, Megan Frederickson won- 

dered how academics were managing 

to write papers during the COVID-19 

pandemic. Lockdowns implemented 
to stem coronavirus spread meant that, 
overnight, many households worldwide had 
become an intersection of work, school and 
home life. Conversations on Twitter seemed 
to confirm Frederickson’s suspicions: female 
academics, taking up increased childcare 
responsibilities, were falling behind their male 
peers at work. 

But Frederickson, an ecologist at the 
University of Toronto, Canada, wanted to see 
what the data said. So, she looked at preprint 
servers to investigate whether women were 
posting fewer studies than they were before 
lockdowns began. The analysis — and several 
others — suggests that, across disciplines, 
women’s publishing rate has fallen relative 
to men’s amid the pandemic (see go.nature. 
com/2a5uwv5). 

The results are consistent with the litera- 
ture on the division of childcare between men 
and women, says Molly King, a sociologist at 
Santa Clara University in California. Evidence 
suggests that male academics are more likely 
to have a partner who does not work outside 
the home; their female colleagues, especially 
those in the natural sciences, are more likely 
to have a partner who is also an academic. 
Even in those dual-academic households, the 
evidence shows that women perform more 
household tasks than men do, she says. King 
suspects the same holds true for childcare. 


Preprint analysis 


In her analysis, Frederickson focused on the 
two preprint servers that she uses: the physi- 
cal-sciences repository arXiv, and bioRxiv for 
the life sciences. To determine the genders of 
more than 73,000 authors named on 36,529 
preprints, she compared the names with those 
inthe US Social Security Administration’s baby- 
name database, which registers the names and 
genders of children born in the United States. 
Frederickson looked at arXiv studies posted 
between 15 March and 15 April in 2019 and in 
2020. The number of women who authored 
preprints grew by 2.7% from 2019 to 2020 — but 


PREPRINT DROP-OFF 


Two separate analyses show that women's 
posting rate on preprint servers has slowed 
during the coronavirus pandemic. 


All-author analysis 

When compared with March and April 2019, the 
number of male authors on preprints posted to 
bioRxiv and arXiv has grown faster than the 
number of female authors in that period this year. 
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First-author analysis 


At many preprint servers, women were 
submitting at a lower rate in March and April, 
as compared with the preceding two months 
and the same months of the previous year. 
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the number of male authors increased by 6.4% 
over that period. The increase in male author- 
ship of bioRxiv preprints also outstripped that 
of female authorship, although by a smaller 
margin (see ‘Preprint drop-off’). (The two 
servers are not directly comparable in Fred- 
erickson’s analysis, because the program that 
she used pulled the names of only correspond- 
ing authors from bioRxiv, whereas all arXiv 
authors were included.) 

“The differences are modest, but they’re 
there,” Frederickson says. She notes that lock- 
downs so far have been short compared with 
the usual research timeline, so the long-term 
effects on women’s careers are still unclear. 

The limitations of these types of name- 
based analysis are well known. Using names to 
predict gender can exclude non-binary people, 
and can misgender others. They are more likely 
to exclude authors with non-Western names. 
And, between disciplines, their utility can vary 
because of naming conventions — such as the 
use of initials instead of given names, as is com- 
monin astrophysics. Still, says Frederickson, 
over alarge sample size, they can provide valua- 
bleinsights into gender disparities in academia. 


Fresh projects 


Other researchers are finding similar trends. 
Cassidy Sugimoto, an information scientist at 
Indiana University Bloomington who studies 
gender disparities in research, conducted asep- 
arate analysis of author gender onnine popular 
preprint servers (see go.nature.com/2xhxqxr). 
Methodological differences meant that the 
two analyses are not directly comparable, but 
Frederickson’s work “converges with what 
we're seeing”, says Sugimoto. 

Sugimoto points out that the preprints 
being published even now probably rely on 
labour that was performed many months ago. 
“The scientific publication process doesn’t 
lend itself to timely analyses,” she says. So her 
study also included databases that log regis- 
tered reports, which indicate the initiation of 
new research projects. 

In 2 of the 3 registered-report repositories, 
covering more than 14,000 reports with 
authors whose genders could be matched, 
Sugimoto’s team found a decrease in the 
proportion of submissions by female prin- 
cipal investigators from March and April 
of 2019 to the same months in 2020, when 
lockdowns started. They also saw a declining 
proportion of women publishing on several 
preprint servers, including EarthArXiv and 
medRxiv. These differences were more pro- 
nounced when looking at first authors, who 
are usually early-career researchers, than at 
last authors, who are often the most senior 
faculty members ona study. 

“This is what’s the most worrying to me, 
because those consequences are long term,” 
Sugimoto says. “The best predictor of a 
publication is a previous publication.” 


Nature | Vol 581 | 28 May 2020 | 365 


News in focus 


FEWER NEW PROJECTS 


Women are registering a smaller proportion of research projects than before the 
pandemic, according to an analysis of registered-report repositories. 
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In economics, too, there are indications 
that the pandemic is disproportionately 
affecting younger researchers, says Noriko 
Amano-Patifio, an economist at the Univer- 
sity of Cambridge, UK. Taken as a whole, there 
aren't clear discrepancies in the overall num- 
ber of working papers — a preprint-like publi- 
cation format in economics — that have been 
submitted to three major repositories, or in 
invited commentaries submitted to a fourth 
site that publishes research-based policy anal- 
yses (see ‘Fewer new projects’). 


Academic responsibilities 
Amano-Patifo and her collaborators 
also examined who was working on pan- 
demic-related research questions using a 
COVID-19-specific repository (see go.nature. 
com/36sj2cm). Although women have con- 
sistently authored about 20% of working 
papers since 2015, they make up only 12% of 
the authors of new COVID-19-related research. 
Amano-Patifo suspects that, in addition to 
their childcare responsibilities, early- and 
mid-career researchers, especially women, 
might be more risk-averse and thus less likely 
to jump into a new field of research. “Mostly 
senior economists are taking their bite into 
these new areas,” says Amano-Patifo. “And 
junior women are the ones that seem to be 
missing out the most.” 

“Unfortunately, these findings are not 
surprising,” says Olga Shurchkov, an econo- 
mist at Wellesley College in Massachusetts. 
Shurchkov came to similar conclusions ina 


COVID-19 EFFECT 


An analysis that looked at 13 medical journals 
found that the proportion of female authors for 
COVID-19 papers is lower than the average for 
all studies published in 2019 in the journals. 
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separate analysis of economists’ productivity 
during the pandemic (see go.nature. 
com/2zyuebi). And a 13 May arXiv preprint 
(J. P. Andersen et al. Preprint at https://arxiv. 
org/abs/2005.06303; 2020) shows the same 
trends in pandemic-related medical litera- 
ture (see ‘COVID-19 effect’). Compared with 
the proportion of women among authors of 
nearly 40,000 articles published in US medical 
journals in 2019, the proportion on COVID-19 
papers has dropped by 16%. 


Increased childcare responsibility is one 
issue. In addition, women are more likely 
than men to take care of ailing relatives, 
says Rosario Rogel-Salazar, a sociologist at 
the Autonomous University of Mexico State in 
Toluca. These effects are probably exacerbated 
in the global south, she notes, because women 
there have more children on average than do 
their northern counterparts. 

And women face other barriers to produc- 
tivity. Female faculty members, on average, 
shoulder more teaching responsibilities, so 
the sudden shift to online teaching — and the 
curriculum adjustments that it requires — dis- 
proportionately affects women, King says. And 
because many institutions are shut, non-re- 
search university commitments — such as par- 
ticipation in hiring and curriculum committees 
— are probably taking up less time. These are 
often dominated by senior faculty members, 
more of whom are men. As a result, men could 
find themselves with more time to write papers. 

Because these effects will be compounded 
as lockdowns persist, universities and funders 
should take steps to mitigate gender dis- 
parities as quickly as possible, Shurchkov 
says. “They point to a problem that, if left 
unaddressed, can potentially have grave 
consequences for diversity in academia.” 


THE QUEST TO ADDRESS 


INE 
THE PANDEMIC 


ALITY DURING 


Better data, testing and preparedness could reduce 
COVID-19’s outsized toll on people of colour. 


By Nidhi Subbaraman 


s figures emerge on the dispropor- 
tionate toll that COVID-19 is taking on 
people of colour in the United States, 
scientists are suggesting measures to 
help mitigate the inequalities. 

They say that better data are needed on the 
incidence of the disease, that testing needs to 
be ramped up and that hospitals serving peo- 
ple at risk need to prepare more effectively. 
Researchers and some US lawmakers are now 
calling for a national commission devoted 
to identifying racial disparities in health 
that would act as a unified voice in trying to 
overcome them. 

The unequal impact of the coronavirus is 
not solely a US problem, but the disparities 
are harshly felt in the United States, which 
currently has the highest number of COVID-19 
infections and deaths in the world. 
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The US Centers for Disease Control and 
Prevention (CDC) started releasing death 
and infection rates broken down by race and 
ethnicity in late April, after a public outcry 
from lawmakers, doctors and civil-rights 
groups. 

The breakdowns were available for just 
35% of US deaths. But as these and other data 
start to come in, they paint a stark picture of 
disproportionate disease burden. 

Many of the causes of these health dis- 
parities are systemic and well known. “We're 
getting infected more because we are exposed 
more and less protected,” says Camara Phyllis 
Jones, an epidemiologist at the Rollins School 
of Public Health at Emory University in Atlanta, 
Georgia. Existing socio-economic and health 
disparities can at least partially explain why 
people of colour are getting ill and dying at 
disproportionate rates. 

Inmany parts of the United States, people of 
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A woman has her temperature taken in Compton, a city south of Los Angeles, California. 


colour make up a high proportion of those in 
some low-paid professions that have elevated 
risks of exposure to the virus — people who 
staff grocery stores, drive buses and work at 
food plants, for example. Also, COVID-19 is 
deadlier for people with chronic conditions, 
including diabetes, obesity and cardiovascu- 
lar disease. These have a higher incidence in 
many minority ethnic and racial groups in the 
United States. 


Reaching for solutions 


At the earliest stages of the epidemic, aspects 
of the US response might have made things 
worse, says Enrique Neblett, a psychologist 
whostudies race and health at the University of 
Michigan in Ann Arbor. With limited tests avail- 
able, US authorities initially reserved them for 
people withsymptoms who hadarecent history 
of overseas travel. This could have excluded 
people from disadvantaged socio-economic 
backgrounds, including people of colour, says 
Neblett. “By having that as a criterion, they’re 
automatically less likely to be tested.” Looking 
ahead, adapting the approaches to getting tests 
to the people and communities that are most 
at risk should be a priority, he says. 

For example, in Louisiana — one of the first 
states to report data by race and ethnic group — 
testing teams went to poorer neighbourhoods 
to reach people without cars, who would have 
trouble making it to drive-through testing sites. 

Outreach must extend beyond testing, 
to all aspects of the response, says Evelynn 
Hammonds, ahistorian of medicine at Harvard 
University in Cambridge, Massachusetts. Clin- 
ical trials, for example, must actively work to 
recruit a diverse population, otherwise the 
treatments and vaccines might not be equally 
effective. “We already knowthere’s areal prob- 
lem with making sure that populations that are 


enlisted into clinical trials need to be diverse,” 
Hammonds says. 

And hospitals in neighbourhoods that are 
likely to see a surge in cases because they 
serve more severely affected groups should 
be equipped sufficiently, says Jones, who 
was president of the American Public Health 
Association in 2016. “If we know that these 
neighbourhoods are the ones being adversely 
impacted, then we need to move the ventila- 
tors and the health staff there,” Jones says. 


Acoherent voice 


Billions of federal dollars have been deployed 
in the United States to tackle the pandemic, 
but because such emergencies require a coor- 
dinated response across national agencies, 
some have argued for a central commission 
that will represent the needs of minority racial 
and ethnic groups. 

“The call is for a national coherent voice 
that starts to talk about these issues,” says 
Cato Laurencin, an orthopaedic surgeon and 
biomedical engineer at the University of Con- 
necticut in Farmington, who led a roundtable 
discussion on diversity at the US National Acad- 
emies of Science, Engineering, and Medicine. 

Hammonds says that sucha group could be 
effective because local US leaders typically 
prescribe public-health guidance and deci- 
sions, and so far have differed in their response 
to the pandemic. She compared New York’s 
rapid yet measured response with that in 
Georgia, where the governor announced a 
gradual re-opening of the economy in mid- 
April — despite advice against the move from 
public-health experts. This pandemic could be 
an opportunity to deploy consistent attention 
towards the needs of under-served communi- 
ties, she says. “Ifthere is a positive to come out 
of this, that would be one of them.” 
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Q&A 


Astronaut 
withsights | 
on Mars 


Jessica Watkins graduated as a member 
of NASA‘s newest astronaut class in 
January. As a planetary geologist, she is 

a leading candidate to participate in the 
agency’s Artemis programme, which aims 
to send people back to the Moon by the 
end of 2024. Further down the line, there 
might even be a trip to Mars, which she 
studied during her PhD. Nature spoke to 
Watkins about her career. 


Why did you join the astronaut corps? 

| have wanted to be an astronaut since | 
was little. There was something that always 
pulled me towards space — the idea of 
exploration, of wanting to push boundaries 
and capabilities, both technically 

and physically, but also mentally and 
spiritually. | kind of stumbled into geology 
and fell in love with that. And then the stars 
aligned for me to end up here. 


What's your favourite planet? 

Mars is definitely my first love. | remember 
writing a book about a Martian in fifth 
grade. What intrigued me the most about 
Mars is how Earth-like it is, and how 

we're able to use Earth as an analogue to 
understand more about Mars. Now, given 
the direction that NASA is going in — we're 
talking about going back to the Moon in 
2024, through Artemis — the Moon has 
become a significant interest for me, as 
well. I’m definitely brushing up on lunar 
geology and what it’s going to be like on 
the surface. 


How can space exploration inspire us 
during the current public-health crisis? 
This pandemic is asking us to band 
together as humans, to do the right thing 
to help save each other. There’s something 
really analogous to human spaceflight in 
that. Human spaceflight is about humans 
pursuing hard things, doing it together, 
and doing it in spite of differences that we 
may have created. Having that perspective 
allows you to see Earth for what it is. It’s 
one body. We're all in this together. 


Interview by Alexandra Witze 
This interview has been edited for length 
and clarity. 
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Protesters rallying in Arizona against lockdowns held up signs carrying anti-vaccine messages and promoting unproven treatments. 


ew 


BATTLING THE INFODEMI 


Researchers are analysing false rumours and disinformation about 
COVID-19 in hopes of curbing their spread. By Philip Ball and Amy Maxmen 


n the first few months of 2020, wild 
conspiracy theories about Bill Gates and 
the new coronavirus began sprouting 
online. Gates, the Microsoft co-founder 
and billionaire philanthropist who has 
funded efforts to control the virus with 
treatments, vaccines and technology, 
had himself created the virus, argued one 
theory. He had patented it, said another. He’d 
use vaccines to control people, declared athird. 
The false claims quietly proliferated among 
groups predisposed to spread the message — 
people opposed to vaccines, globalization or 
the privacy infringements enabled by technol- 
ogy. Then one went mainstream. 
On19 March, the website Biohackinfo.com 


falsely claimed that Gates planned to use a 
coronavirus vaccine as a ploy to monitor 
people through an injected microchip or 
quantum-dot spy software. Two days later, 
traffic started flowing to a YouTube video on 
the idea. It’s been viewed nearly two million 
times. The idea reached Roger Stone — a 
former adviser to US President Donald 
Trump — who in April discussed the theory 
onaradio show, adding that he’d never trust 
a coronavirus vaccine that Gates had funded. 
The interview was covered by the newspaper 
the New York Post, which didn’t debunk the 
notion. Then that article was liked, shared or 
commented on by nearly one million people 
on Facebook. “That’s better performance 
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than most mainstream media news stories,” 
says Joan Donovan, a sociologist at Harvard 
University in Cambridge, Massachusetts. 
Donovan charts the path of this piece of dis- 
information like an epidemiologist tracking 
the transmission of a new virus. As with epi- 
demics, there are ‘superspreader’ moments. 
After the New York Post story went live, several 
high-profile figures with nearly one million 
Facebook followers each posted their own 
alarming comments, as if the story about Gates 
devising vaccines to track people were true. 
The Gates conspiracy theories are part of 
an ocean of misinformation on COVID-19 that 
is spreading online. Every major news event 
comes drenched in rumours and propaganda. 
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But COVID-19 is “the perfect storm for the 
diffusion of false rumour and fake news”, says 
data scientist Walter Quattrociocchi at the 
Ca’Foscari University of Venice, Italy. People 
are spending more time at home, and search- 
ing online for answers to an uncertain and 
rapidly changing situation. “The topic is polar- 
izing, scary, captivating. And it’s really easy 
for everyone to get information that is con- 
sistent with their system of belief,” Quattro- 
ciocchi says. The World Health Organization 
(WHO) has called the situation an infodemic: 
“An over-abundance of information — some 
accurate and some not — rendering it difficult 
to find trustworthy sources of information and 
reliable guidance.” 

For researchers who track how information 
spreads, COVID-19 is an experimental subject 
like no other. “This is an opportunity to see 
how the whole world pays attention to atopic,” 
says Renée diResta at the Stanford Internet 
Observatory in California. She and many oth- 
ers have been scrambling to track and analyse 
the disparate falsehoods floating around — 
both ‘misinformation’, which is wrong but not 
deliberately misleading, and ‘disinformation’, 
which refers to organized falsehoods that are 
intended to deceive. In a global health crisis, 
inaccurate information doesn’t only mislead, 
but could bea matter of life and death if people 
start taking unproven drugs, ignoring pub- 
lic-health advice, or refusing a coronavirus 
vaccine if one becomes available. 

By studying the sources and spread of false 
information about COVID-19, researchers hope 
to understand where such information comes 
from, how it grows and — they hope — howto 
elevate facts over falsehood. It’s a battle that 
can’t be won completely, researchers agree — 
it’s not possible to stop people from spreading 
ill-founded rumours. But in the language of 
epidemiology, the hope is to come up with 
effective strategies to ‘flatten the curve’ of 
the infodemic, so that bad information can’t 
spread as far and as fast. 


No filter 


Researchers have been monitoring the flow of 
information online for years, and havea good 
sense of how unreliable rumours start and 
spread. Over the past 15 years, technology 
and shifting societal norms have removed 
many of the filters that were once placed 
on information, says Amil Khan, director 
of the communications agency Valent Pro- 
jects in London, who has worked on analys- 
ing misinformation for the UK government. 
Rumour-mongers who might once have been 
isolated in their local communities can con- 
nect with like-minded sceptics anywhere in 
the world. The social-media platforms they 
use are run to maximize user engagement, 
rather than to favour evidence-based infor- 
mation. As these platforms have exploded in 
popularity over the past decade and a half, so 
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HIGHWAYS OF HATE 


Neil Johnson at George Washington University in 
Washington DC and his team mapped how malicious 
content about a pneumonia-like disease, possibly 
COVID-19, started on the forum 4chan in December. By 
January, the content had spread to other social-media 
platforms — Gab, Telegram and Facebook — through 
links connecting pages on one platform with another. 
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political partisanship and voices that distrust 
authority have grown too. 

To chart the current infodemic, data 
scientists and communications researchers 
are now analysing millions of messages on 
social media. A team led by Emilio Ferrara, a 
data scientist at the University of Southern 
California in Los Angeles, has released a data 
set of more than 120 million tweets on the 
coronavirus!. Theoretical physicist Manlio 
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De Domenico at the Bruno Kessler Institute, 
a research institute for artificial intelligence 
in Trento, Italy, has set up what he calls a 
COVID-19 “infodemic observatory”, using 
automated software to watch 4.7 million 
tweets on COVID-19 streaming past every 
day. (The actual figure is higher, but that 
is as many as Twitter will allow the team to 
track.) De Domenico and his team evaluate 
the tweets’ emotional content and, where pos- 
sible, the region they were sent from. They 
then estimate their reliability by looking at 
the sources to which a message links. (Like 
many data scientists, they rely on the work 
of fact-checking journalists to distinguish 
reliable news sources or claims from unrelia- 
ble ones.) Similarly, in March, Quattrociocchi 
and his co-workers reported? a data set of 
around 1.3 million posts and 7.5 million com- 
ments on COVID-19 from several social-media 
platforms, including Reddit, WhatsApp, Ins- 
tagram and Gab (known for its right-wing 
audience), from 1January to mid-February. 

A study in 2018 suggested that false news 
generally travels faster than reliable news on 
Twitter®. But that isn’t necessarily the case 
in this pandemic, says Quattrociocchi. His 
team followed some examples of false and 
true COVID-19 news — as classified by fact- 
checker sites — and found that reliable posts 
saw as many reactions as unreliable posts on 
Twitter’. The analysis is preliminary and hasn't 
yet been peer reviewed. 

Ferrara says that in the millions of tweets 
about the coronavirus in January, misinfor- 
mation didn’t dominate the discussion. Much 
of the confusion at the start of the pandemic 
related to fundamental scientific uncertain- 
ties about the outbreak. Key features of the 
virus — its transmissibility, for instance, and 
its case-fatality rate — could be estimated only 
with large error margins. Where expert scien- 
tists were honest about this, says biologist Carl 
Bergstrom at the University of Washington in 
Seattle, it created an “uncertainty vacuum” 
that allowed superficially reputable sources 
to jump in without real expertise. These 
included academics with meagre credentials 
for pronouncing on epidemiology, he says, 
or analysts who were good at crunching num- 
bers but lacked a deep understanding of the 
underlying science. 


Politics and scams 


As the pandemic shifted to the United States 
and Europe, false information increased, 
says Donovan. A sizeable part of the prob- 
lem has been political. A briefing prepared 
for the European Parliament in April alleged 
that Russia and China are “driving parallel 
information campaigns, conveying the over- 
all message that democratic state actors are 
failing and that European citizens cannot 
trust their health systems, whereas their 
authoritarian systems can save the world.” 
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The messages of US President Donald Trump 
and his administration are sowing their own 
political chaos. This includes Trump’s insist- 
ence on referring to the ‘Chinese’ or ‘Wuhan’ 
coronavirus and his advocacy of unproven 
(and even hazardous) ‘cures’, and the allega- 
tion by US Secretary of State Mike Pompeo that 
the virus originated ina laboratory, despite the 
lack of evidence. 

There are organized scams, too. More than 
68,000 website domains have been registered 
this year with keywords associated with the 
coronavirus, says Donovan. She’s reviewed 
ones that sell fake treatments for COVID-19, 
and others that collect personal information. 
Google’s search-engine algorithms rank infor- 
mation fromthe WHO and other public-health 
agencies higher than that from other sources, 
but rankings vary depending on what terms 
a person enters ina search. Some scam sites 
have managed to come out ahead by using a 
combination of keywords optimized and tar- 
geted to a particular audience, such as newly 
unemployed people, Donovan says. 


Spreading agendas 

Many of the falsehoods online don’t have 
obvious sources or intentions. Rather, they 
often begin with niche groups mobilizing 
around their favoured agendas. Neil Johnson, 
a physicist at George Washington University in 
Washington DC, has reported* COVID-19 mis- 
information narratives taking shape among 
online communities of extremist and far-right 
‘hate’ groups, which occupy largely unregu- 
lated platforms including VKontakte, Gab 
and 4Chan, as well as mainstream ones such 
as Facebook and Instagram. 

The study says that a “hate multiverse’ is 
exploiting the COVID-19 pandemic to spread 
racism and other malicious agendas, focusing 
an initially rather diverse and incoherent set 
of messages into a few dominant narratives, 
such as blaming Jews and immigrants for 
starting or spreading the virus, or asserting 
that it is a weapon being used by the “Deep 
State” to control population growth (see 
‘Highways of hate’). 

An alarming feature of this network is its 
capacity to draw in outside users through 
what Johnson and his team call “wormhole” 
links. These are shortcuts from a network 
engaged with quite different issues. The 
hate multiverse, the researchers say, “acts 
like a global funnel that can suck individuals 
froma mainstream cluster ona platform that 
invests significant resources in moderation, 
into less moderated platforms like 4Chan or 
Telegram”. As a result, Johnson says, racist 
views are starting to appear in the anti-vaccine 
communities, too. “The rise of fear and mis- 
information around COVID-19 has allowed 
promoters of malicious matter and hate to 
engage with mainstream audiences around 
acommon topic of interest, and potentially 


AFACT-CHECKING FRENZY 


Fact-checkers have worked overtime correcting COVID-19 falsehoods. One alliance has collated more than 
6,000 examples of fact-checks across a broad range of categories since 14 January. Data as of 19 May. 


140 --- 


® Authorities* 
120 ... & Causes 
© Conspiracy theoryt 
@ Cures 
© Spread 
Hi Symptoms 
Other 


100 -- 


80 -. 


; t i 


Feb 


Number of articles checked 


14 Jan 
2020» 


Mar 


peas mS su Lay a seen aeRO Ep maa ETea Rate seh Taa aN Steen eeaEE ena ava Fenn R  URR Us et ei ANIRREEA EOE 


M 
Il 


Apr May 


*Stories about what governments and health authorities say, and what is alleged about them. 


tAny category that involves a conspiracy theory. 


push them toward hateful views,” his team says 
in the paper. 


Dangerous spread 


As misinformation grows, it sometimes 
becomes deadly. On Twitter in early March, 
technology entrepreneurs and investors 
shared a document prematurely extolling 
the benefits of chloroquine, an old malaria 
drug, as an antiviral against COVID-19. The 
document, which claimed that the drug had 
produced favourable outcomes in China and 
South Korea, was widely passed around even 
before the results of a small, non-randomized 
French trial of the related drug hydroxychlo- 
roquine® were posted online on 17 March. The 
next day, Fox News aired a segment with one 
of the authors of the original document. And 
the following day, Trump called the drugs 
“very powerful” at a press briefing, despite 
the lack of evidence. There were small spikes 
in Google searches for hydroxychloroquine, 


“We are producing much 
more information than 
what we can really parse and 
consume.’ 


chloroquine and their key ingredient, quinine, 
in mid-March — with the largest surge on the 
day of Trump’s remarks, Donovan found using 
Google Trends. “Just like toilet paper, masks 
and hand sanitizer, if there was a product to be 
had, it would have sold out,” she says. Indeed, it 
did insome places, worrying people who need 
the drugs to treat conditions such as lupus. 
Hospitals have reported poisonings in peo- 
ple who experienced toxic side effects from 
pills containing chloroquine, and sucha large 
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number of people with COVID-19 have been 
asking for the drug that it has derailed clinical 
trials of other treatments. 

Fox News has been particularly scrutinized 
for its part in amplifying dangerous misinfor- 
mation. In a phone survey of 1,000 randomly 
chosen Americans in early March®, commu- 
nication researchers found that respondents 
who tended to get their information from 
mainstream broadcast and print media had 
more accurate ideas about the disease’s 
lethality and how to protect themselves from 
infection than did those who got their news 
mostly from conservative media (such as Fox 
News and Rush Limbaugh’s radio show) or 
from social media. That held true even after 
factors suchas political affiliation, gender, age 
and education were controlled for. 

Those results echo another study, as yet not 
peer reviewed, in which economists at the Uni- 
versity of Chicago in Illinois tried to analyse the 
effects of two Fox News presenters on viewers’ 
opinions during February, as the coronavirus 
began to spread beyond China. One presenter, 
Sean Hannity, downplayed the coronavirus’s 
risk and accused Democrats of using it as a 
weapon to undermine the president; the other, 
Tucker Carlson, reported that the disease was 
serious. The study found that areas of the 
country where more viewers watched Hann- 
ity Saw more cases and deaths than did those 
where more watched Carlson — a divergence 
that disappeared when Hannity adjusted his 
position to take the pandemic more seriously. 

De Domenico says he is encouraged that, 
as the crisis has deepened, so has many peo- 
ple’s determination to find more reliable 
information. “When COVID-19 started to hit 
each country, we have observed dramatic 
changes of attitude,” he says. “People started 
toconsumeand share more reliable news from 
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trusted sources.” Of course, the goal is to have 
people listening to the best available advice 
on risk before they watch people die around 
them, Donovan says. 


Flattening the curve 


In March, Brazilian President Jair Bolsonaro 
began to spread misinformation on social 
media — posting a video that falsely said 
hydroxychloroquine was an effective treat- 
ment for COVID-19 — but was stopped in his 
tracks. Twitter, Facebook and YouTube took 
the unprecedented step of deleting posts from 
ahead of state, on the grounds that they could 
cause harm. 

Social-media platforms have stepped up 
their efforts to flag or remove misinforma- 
tion and to guide people to reliable sources. 
In mid-March, Facebook, Google, LinkedIn, 
Microsoft, Reddit, Twitter and YouTube 
issued a joint statement saying that they were 
working together on “combating fraud and 
misinformation about the virus”. Facebook 
and Google have banned advertisements for 
‘miracle cures’ or overpriced face masks, for 
example. YouTube is promoting ‘verified’ 
information videos about the coronavirus. 

Social-media platforms often rely on 
fact-checkers at independent media 
organizations to flag up misleading content. 
In January, 88 media organizations around 
the world joined together to record their 
fact-checks of COVID-19 claims in a database 
maintained by the International Fact-checking 
Network (IFCN), part of the Poynter Institute 
for Media Studies in St Petersburg, Florida 
(see ‘A fact-checking frenzy’). The database 
currently holds more than 6,000 examples, 
and the IFCN is now inviting academics to dig 
into the data. (Another site, Google’s fact-check 
explorer, records more than 2,700 fact-checks 
about COVID-19.) But some fact-checking 
organizations, suchas Snopes, have admitted to 
being overwhelmed by the quantity of informa- 
tion they are having to deal with. “The problem 
with infodemics is its huge scale: collectively, 
we are producing much more information than 
what we can really parse and consume,’ says 
De Domenico. “Even having thousands of pro- 
fessional fact-checkers might not be enough.” 

Communication scholar Scott Brennen atthe 
Oxford Internet Institute, UK, and his co-work- 
ers have found that social-media companies 
have donea decent job of removing misleading 
posts, given the hard task. The team followed 
up 225 pieces of misinformation about the 
coronavirus that independent fact-checkers 
had collated in the IFCN or Google databases 
as false or misleading. In a7 April report, the 
team found that by the end of March, only 
some 25% of these false claims remained in 
place without warning labels on YouTube and 
Facebook, although on Twitter that proportion 
was 59% (see go.nature.com/2tvhuj5). And Fer- 
rara says that about 5% of the 11 million Twitter 
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users his team has studied so far in its COVID-19 
database have been shut down for violating 
the platform’s policies of use, and that these 
tended to be unusually active accounts. 

But some creators of content have found 
ways to delay detection by social-media 
moderators, Donovan notes, in what she calls 
“hidden virality”. One way is to post content 
in private groups on Facebook. Because the 
platform relies largely on its users to flag up 
bad information, shares of misleading posts 
in private communities are flagged less often 
because everyone in the group tends to agree 
with one another, she says. Donovan used to 
study white supremacy online, and says a lot 
of ‘alt-right’ content wasn’t flagged until it 
leaked into public Facebook domains. Using 
CrowdTangle, a social-media-tracking tool 
owned by Facebook, Donovan found that more 
than 90% of the million or so interactions refer- 
ring to the New York Postarticle about the Gates 
vaccine conspiracy were on private pages. 

Another way in which manipulators slip 
past moderation is by sharing the same post 
from a new location online, says Donovan. 
For instance, when people on Facebook 
began sharing an article that alleged that 
21 million people had died of COVID-19 in 
China, Facebook put a label on the article to 
indicate that it contained dubious informa- 


“The problem isn’t alack of 
facts. It’s about what sources 
people trust.” 


tion, and limited its ranking so that it wasn’t 
prioritized in a search (China has confirmed 
many fewer deaths: 4,638). Immediately, how- 
ever, people began posting a copy of the article 
that had been stored onthe Internet Archive, a 
website that preserves content. This copy was 
shared 118,000 times before Facebook placed 
a warning on the link. 

Quattrociocchi says that, faced with regu- 
lation of content on platforms such as Twitter 
and Facebook, some misinformation simply 
migrates elsewhere: regulation is currently 
worse, he says, on Gab and WhatsApp. And 
there is only so much you can do to police 
social media: “If someoneis really committed,” 
says Ferrara, “once you suspend them, they go 
back and create another account.” 

Donovan agrees, but argues that 
social-media companies could implement 
stronger, faster moderation, such as finding 
when posts that have already been flagged, or 
deleted, are revived with alternative links. In 
addition, she says, social-media firms might 
need to adjust their policies on permitting 
political discourse when it threatens lives. 
She says that health misinformation is increas- 
ingly being buried in messages that seem 
strictly political at first glance. A Facebook 
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group urging protests against stay-at-home 
restrictions — Re-Open Alabama — featured a 
video (viewed 868,000 times) of a doctor say- 
ing that his colleagues have determined that 
COVID-19 is similar to influenza, and “it shows 
healthy people don’t need to shelter in place 
anymore’. Those messages could lead people 
toignore public-health guidance and endanger 
many others, says Donovan. But Facebook has 
been slowto curb these messages because they 
seem to be expressing political opinions. “It’s 
important to demonstrate to platform com- 
panies that they aren’t moderating political 
speech,” Donovan says. “They need to look at 
what kind of health misinformation backs their 
claims that restrictions are unjustified.” (Face- 
book did not reply to arequest for comment.) 


Gaining trust 


Efforts to raise the profile of good information, 
and slap a warning label on the bad, can only go 
so far, says DiResta. “If people think the WHO 
is anti-American, or Anthony Fauci is corrupt, 
or that Bill Gates is evil, then elevating an alter- 
native source doesn’t do much — it just makes 
people think that platform is colluding with 
that source,” she says. “The problemisn’t alack 
of facts, it’s about what sources people trust.” 

Brennen agrees. “The people in conspir- 
acy communities think that they are doing 
what they should: being critical consumers 
of media,” he says. “They think they are doing 
their own research, and that what the consen- 
sus might advocate is itself misinformation.” 

That sentiment could grow if public-health 
authorities don’t inspire confidence when they 
change their advice from week to week — on 
facemasks, for example, or on immunity to 
COVID-19. Some researchers say the authori- 
ties could be doing a better job at explaining 
the evidence, or lack of it, that guided them. 

For now, US polling suggests that the 
public still supports vaccines. But anti-vac- 
cine protesters are making more noise. At 
rallies protesting against lockdowns in Cali- 
fornia in May, for instance, some protestors 
carried signs saying, “No Mandatory Vaccines’. 
Anti-vaccination online hubs are leaping on 
to COVID-19, says Johnson. “It’s almost like 
they’ve been waiting for this. It crystallizes 
everything they’ve been saying.” 


Philip Ball is a science writer based in London, 
and Amy Maxmen is a senior reporter at 
Nature, based in San Francisco, California. 
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Garment workers in Bangladesh risk their lives in an industry devastated by the collapse in global spending. 


Pa 


COVID economics — first of many books 


Breakneck triage nails many diagnoses. Deeper treatment is needed. By Philip Ball 


here has never been a harder time to 

be a political leader. The choices that 

must be made are enormous, the con- 

sequences potentially catastrophic, 

the science guiding those decisions 
uncertain — and there is no precedent. As a 
result, the COVID-19 pandemic has revealed 
some of the best and the worst in the world’s 
leaders: from opportunism and denial to com- 
passion and clarity. 

It’s ashame that policymakers did not have 
books such as Joshua Gans’s Economics in 
the Age of COVID-19 to lay out the issues for 
them in January. It is remarkable that they do 


already. Gans completed this book at break- 
neck speed, by late March. His attempt to 
explain the economic thinking that should 
guide policy is useful, but inevitably limited. 


Economics in the Age 
of COVID-19 

Joshua Gans 

MIT Press (2020). 
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With the situation and knowledge changing 
daily, unfurling events will always render some 
aspects of such an analysis obsolete. In this 
sense, Gans, an economist at the University 
of Toronto in Canada, has taken a brave shot 
at an impossible task. Ultimately, economic 
thinking will need wider horizons. 

The crisis has forced some politicians, 
especially on the right, to go against deeply 
held inclinations by implementing interven- 
tions and financial handouts that, in normal 
times, even most of their opponents would 
deem excessive. Countries have tried to freeze 
their economies and prop up the absence 
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There has been a surge in COVID-19 cases among meatpackers in the United States. 


of liquidity and wages with eye-watering 
subsidies until the wheels start turning again. 
Therein lies the difference from the oft-cited 
comparison with wartime economics. In that 
situation, activity continues, but redirected. 
The present worldwide lockdowns have dras- 
tically shrunk the workforce. Aside from 
essential workers — in health, care, food and 
transport, say — only those jobs that can be 
done alone from home can safely continue 
(never have | felt luckier to bea writer). This has 
sometimes been presented, too simplistically, 
as creating a choice between saving lives or 
saving the economy. As many countries have 
now passed the (first?) peak of infections, dis- 
cussion has turned to the dangers to health 
posed by an economy left too long in stasis. 
That discussion needs to happen, but it 
risks becoming facile, too. Presenting lives 
versus livelihoods as a dichotomy is used in 
defence of leaders who hesitated to impose 
a lockdown. That, Gans shows, is mistaken. 
The highly infectious nature and the fatality 
rate of COVID-19, which were both clear early 
on, even if exact numbers were not, meant 
there was never a gradual trade-off to be had: 
a dash more economy at the price of a few 
more deaths. “If you know you are going to 
shut down the country eventually, there are 
huge returns to doing it quickly,” Gans writes. 
Itis the only way to keep choices open as more 
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is learnt about the virus and its spread. 

This is not hindsight: Gans was writing while 
the UK and US governments were procrasti- 
nating. Nor is it just about saving lives in the 
short term. “Pursuing public health can be 
consistent with superior long-run economic 
performance,” Gans writes. 

And to be effective, that decision to shut- 
ter must be made with “resolve, clarity [and] 
transparency”. If leaders downplay the enor- 
mity of the crisis, prevaricate or issue weak 


“How do you keep the 
economy in suspended 
animation, without the 
onset of necrosis?” 


behavioural guidelines — rather than expec- 
tations with consequences — then individuals 
will “do as they often do and pursue their own 
interest”, and will “keep businesses open and 
keep engaging in social life”. 

Then there’s the question of howto manage 
the crisis in an economy on pause. Again, ide- 
ology might clash with reality. If you urgently 
need masks or ventilators, then there’s no time 
to put it out to tender and let market mecha- 
nisms make the choices about who gets the 
contract and the product. There must be 
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centralized decision-making and allocation, 
even if that risks a degree of ‘inefficiency’. 

And how do you keep the economy in 
suspended animation, without the onset of 
necrosis? Governments have generally real- 
ized that they must help to cover lost wages, 
but the details are very tricky. The options of 
providing financial assistance to cover bills, 
such as rent and mortgages, suspending 
those costs or covering them directly are not 
equivalent. Gans says that the aim must be for 
payments, subsidies and loans to “ensure that 
people’s short-term disruptions are not trans- 
lated into long-term breakups”. One solution, 
he suggests, is repayment of government loans 
over time through taxation. 

Is the past any guide? Gans touches on 
the only comparable event in recent times, 
the 1918 influenza pandemic. The economic 
consequences of that were complicated 
because it occurred directly after a global 
war (mobilization of troops exacerbated that 
outbreak). Gans might also have mentioned 
the AIDS epidemic in Africa, which has in some 
regions been devastating enough to orphan 
generations, deplete the workforce and ham- 
per economic development. Prioritizing the 
economy over health is not necessarily taking 
the long view. 

Gans does a good job fleshing out the 
requirements for an exit strategy. He says we 


USDA/ALAMY 


now need to “invest in the testing economy”, 
for example to establish who can safely return 
to work and to monitor workplace safety. 
“Countries and regions that were able to test, 
trace, and then isolate the infected were able 
to contain the virus quickly and reopen their 
economies sooner”, he points out — inaringing 
endorsement of World Health Organization 
policy. Even with a vaccine, he says, testing is 
likely to be a part of our daily lives for many 
years. 

He also offers a useful discussion of how to 
optimally allocate a vaccine when it has not 
been produced in sufficient quantities for all 
(although he does not go into the issue of peo- 
ple refusing it, whichis likely to bea problem). 
And he considers how innovation in vaccine 
development can be motivated without reli- 
ance on market forces and patenting of what 
is so clearly a global public good. There is pre- 
vious discussion he might have drawn on here 
about the development of urgently needed 
drugs and treatments that seem unlikely to 
generate profits for pharmaceutical compa- 
nies, such as new antibiotics and treatments 
for tuberculosis. 

As others have done after previous 
outbreaks, Gans advocates establishing a 
pan-national institution with a “set of resources 
to contain future pandemics and ensure an 
international, harmonized response”. More 
like the International Monetary Fund than the 
WHO in his vision, this would focus not just 
on drugs but also on innovations to enhance 
protection from infection at work and on pub- 
lic transport. He calls “hundreds of billions of 
dollars per year to mitigate substantially the 
risk of global pandemics” a no-brainer, echo- 
ing those after the first Ebola outbreak who 
drew parallels with defence spending. 

Here, the book stops short. Realistically, 
Gans’s word was always going to be the first, 
not the last. But he paints a picture of a post- 
COVID-19 world that is largely back to nor- 
mal, with some inconveniences. The truth is 
that the pandemic throws much more into 
question. Whatever landscape emerges, it is 
unlikely to be same as that at the end of 2019. 

There is a moral case for rethinking ine- 
qualities in light of what we have learnt about 
whois truly essential for society’s functioning. 
Some aspects of neoliberal economic policy 
are fundamentally in conflict with the needs 
of a fragile world, with greater risks to come. 
The behaviour of some leaders has pointed 
out the dangers of an information economy 
that has become a ‘marketplace for truth’. And 
economics itself must incorporate the revision 
of past preconceptions and habits that will be 
demanded of the rest of us. 


Philip Ball is a science writer and author based 
in London. His latest book is How to Grow 

a Human. 

e-mail: p.ball@btinternet.com 
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Einstein on Einstein 

Hanoch Gutfreund & Jurgen Renn Princeton Univ. Press (2020) 

Albert Einstein admitted in his final essay, ‘Autobiographical sketch’, 
that fellow physicists opposed his quest to unify the general theory 

of relativity with quantum mechanics. But he took comfort from 
philosopher Gotthold Lessing’s dictum: “The search for truth is more 
precious than its possession.” The 1955 work appears in English for 

the first time in this outstanding study of another essay, which Einstein 
called his “obituary”: 1949's ‘Autobiographical notes’. Physicist Hanoch 
Gutfreund and historian Jurgen Renn provide a sparky commentary. 


Forgotten Peoples of the Ancient World 

Philip Matyszak Thames & Hudson (2020) 

Western ideas on antiquity are dominated by Egyptians, Babylonians, 
Assyrians, Hebrews, Greeks and Romans, with other cultures often 
reduced to stereotypes. Historian Philip Matyszak asks: were the 
Philistines philistines and the Vandals vandals? His stimulating 
encyclopaedia of 40 “forgotten peoples” begins with the Akkadians 
around 2330 Bc and ends with the Hephthalites (‘White Huns’) in the fifth 
century AD. Illustrations include a Roman-style Vandal mosaic; far from 
vilifying Roman culture, the Vandals respected it, say current historians. 
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Good Work If You Can Get It 

Jason Brennan Johns Hopkins Univ. Press (2020) 

Economist and strategist Jason Brennan delivers a data-driven, 
punchily practical guide to succeeding in academia, aimed at PhD 
students. He knows how success requires narrow professionalism, 
but he also networks widely. And if a PhD does not yield an academic 
job, all is far from lost. “Faculty jobs are the nail for which the PhD 

is the hammer,” he concludes in the chapter ‘Exit options’. Yet the 
hammer can be repurposed for diverse non-academic jobs: the 

US unemployment rate for PhD holders is just 1.7%. 


The Bearded Lady Project 

Eds Lexi Jamieson Marsh & Ellen Currano Columbia Univ. Press (2020) 
“Many of our Bearded Ladies became professional palaeontologists 
because they did not want to spend every workday inside, in an office, 
behind a desk,” writes palaeobotanist Ellen Currano, co-founder of 

the Bearded Lady Project with film-maker Lexi Jamieson Marsh, in 
their photo-biography of a weirdly compelling collaboration. It began 
six years ago, out of despair at male dominance of their professions. 
“Maybe | should sport a beard,” Currano joked. Dozens of female 
geoscientists have now posed, artificially hirsute. 


Bite Back 

Eds Saru Jayaraman & Kathryn De Master Univ. California Press (2020) 
In this cleverly titled collection, attorney Saru Jayaraman and rural 
sociologist Kathryn De Master conclude that corporations control 
much of our food because of “their unbridled, unregulated power over 
our democracy”. Articles on seeds, labour, hunger and more describe 
calls to action and collective response, such as mobilization of New 
York state residents to force a ban on fracking, because of its potential 
to harm farms. Only direct public confrontation with corporate food 
elites will succeed, the editors argue. Andrew Robinson 
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Obituary 


Julian Perry Robinson 


(1941-2020) 


Chemist and lawyer who shaped international weapons conventions. 


n 1981, the US government publicly 

accused Soviet-backed forces in south- 

east Asia of waging toxin warfare and 

violating their legal obligations under the 

1925 Geneva Protocol and 1972 Biological 
Weapons Convention. It alleged that aircraft 
dispersed ‘yellow rain’ containing mycotoxins 
that were “not indigenous to the region”. Julian 
Perry Robinson, working alongside biologist 
Matthew Meselson at Harvard University in 
Cambridge, Massachusetts, established that 
what actually fell was wild-honeybee faeces 
containing naturally occurring toxins. He died 
on 22 April, aged 78. 

This episode illustrates how Robinson 
helped to bring rationality into a field in 
which emotions often run high. His ideas 
influenced the negotiation and implemen- 
tation of international law. In a major 1970 
report for the World Health Organization, he 
began to articulate the idea that an ability to 
respond to natural-disease outbreaks could 
considerably diminish incentives to use chem- 
ical and biological warfare (CBW). He called 
for the strengthening of disease surveillance 
and other key areas in public health, and sug- 
gested that if there was to be “any chance of 
success”, a clear plan was needed for commu- 
nicating information with the public. His death 
from complications of COVID-19 is therefore 
particularly poignant. 

Robinson was born in Jerusalem in Novem- 
ber 1941. His interest in CBW began during 
the final year of his chemistry degree at the 
University of Oxford, UK. Working under 
the economist John Jewkes, his dissertation 
examined how the study of chemical warfare 
during the Second World War stimulated 
the synthesis of new types of organic com- 
pounds. On graduating, he spent four years 
with Kilburn & Strode Chartered Patent Agents 
in London before taking a position at the fledg- 
ling Stockholm International Peace Research 
Institute (SIPRI) in 1968, and becoming its focal 
point for CBW studies. From the mid-1970s, he 
was also central to the highly influential CBW 
Study Groups convened by the Pugwash Con- 
ferences on Science and World Affairs, which 
brought together scientists from east and west 
to discuss disarmament. 

At SIPRI he had a prominent role in a major 
review of CBW. This resulted in the classic 
six-volume study The Problem of Chemical 
and Biological Warfare (1971-75). He devel- 
oped the concept of a cross-cultural ‘taboo’ 


378 | Nature | Vol 581 | 28 May 2020 


on such weapons, and detailed the process of 
‘assimilation’ that leads to their acceptance 
into existing military organizations. For exam- 
ple, chemical weapons went unused during the 
Second World War not because of deterrence, 
but because specialized First World War-era 
chemical warfare organizations were sidelined 
by conventional military institutions. 

He met Mary Kaldor, his partner of more 
than 50 years, at SIPRI, and they returned to 
the United Kingdom in1971 to join the Science 


“Part of Robinson’s influence 
lay in identifying and 
seeking to close loopholes.” 


Policy Research Unit at the University of 
Sussex in Brighton. Robinson spent the rest 
of his career there establishing the Harvard 
Sussex Program, a group for research, commu- 
nication and training in support of informed 
public policy towards CBW issues that he 
co-directed with Meselson. There, he spot- 
ted another barrier to assimilation: existing 
technologies constrain emerging ones as they 
pass through a ‘weapons succession process’. 

Part of Robinson’s influence lay in identifying 
and seeking to close loopholes through which 
suppressed CBW technology might develop. 
He provided key inputs into the negotiations 
for the Biological Weapons Convention and 
the 1993 Chemical Weapons Convention. He 
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advocated for proper implementation of the 
‘general purpose criterion’ — the concept that 
malign intentions rather than physical objects 
should be prohibited — as the main mechanism 
by which the treaties could avoid being under- 
mined by advances in science and technology. 

In the years just after the cold war, he 
stood firm against what he considered to be 
a “creeping legitimization” of those chemical 
and biological weapons not tooled for mass 
destruction (non-WMD CBW). In 2008, he 
warned that chemical weapons lent them- 
selves particularly to the new types of conflict 
experienced in places such as the Balkans, 
Afghanistan and parts of Africa, suggesting 
the taboo against CBW could become harder 
to maintain. 

These became the hallmark issues that he 
encouraged numerous research students 
to tackle. His mentoring of generations of 
scholars and policy practitioners means his 
ideas will continue to shape both treaties for 
decades. 

He officially retired from the University 
of Sussex in 2007, but continued to work at 
the Harvard Sussex Program, happiest when 
receiving visitors who came to discuss ideas 
and work in the extensive and unique archive on 
CBW issues that he and Meselson established. 
One of his last research projects resulted ina 
series of vignettes from across history that he 
described as “lessons about CBW” for future 
generations. He also continued to write detailed 
chronologies — including one on ‘novichoks’, 
after the nerve agents were used in Salisbury, 
UK, in 2018. 

I became a student of Julian’s in 1996, 
and continued to work alongside him at the 
Harvard Sussex Program until he went into 
self-isolation. One of our last conversations 
was about establishing a small study group to 
consider whether his influential chronology 
onchemical weapons in Syria might reveal pat- 
terns that those guarding against CBW should 
know about. Always modest, he was hesitant 
about whether people would spend time read- 
ing it. WhenI suggested some might, he smiled 
and said “Well now, there’s athought. Let’s talk 
more on the other side of all of this.” 


Caitriona McLeish is a senior fellow and 
co-director of the Harvard Sussex Program at 
the Science Policy Research Unit, University of 
Sussex, UK. 

e-mail: c.a.mcleish@sussex.ac.uk 
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A woman in Beijing shows a health QR code on her phone to acc 
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ess a shopping area, as a security guard checks her temperature. 


Ten reasons why immunity 
passports are a bad idea 


Natalie Kofler and Frangoise Baylis 


Restricting movement onthe 
basis of biology threatens 
freedom, fairness and 

public health. 


maginea world where your ability to get a 
job, housing or aloan depends on passing 
ablood test. You are confined to your home 
and locked out of society if you lack certain 
antibodies. 

It has happened before. For most of the 
nineteenth century, immunity to yellow fever 
divided people in New Orleans, Louisiana, 
between the ‘acclimated’ who had survived 


yellow fever and the ‘unacclimated’, who had 
not had the disease’. Lack ofimmunity dictated 
whom people could marry, where they could 
work, and, for those forced into slavery, how 
much they were worth. Presumed immunity 
concentrated political and economic power 
in the hands of the wealthy elite, and was 
weaponized to justify white supremacy. 
Something similar could be our dystopian 
future if governments introduce ‘immunity 
passports’ in efforts to reverse the economic 
catastrophe of the COVID-19 pandemic. The 
ideais that such certificates would be issued to 
those who have recovered and tested positive 
for antibodies to SARS-CoV-2 — the coronavirus 
that causes the disease. Authorities would lift 
restrictions on those who are presumed to have 


© 2020 Springer Nature Limited. All rights reserved. 


immunity, allowing them to return to work, to 
socialize and to travel. This idea has so many 
flaws that it is hard to know where to begin. 
On 24 April, the World Health Organization 
(WHO) cautioned against issuing immunity 
passports because their accuracy could not 
be guaranteed. It stated that: “There is cur- 
rently no evidence that people who have 
recovered from COVID-19 and have antibodies 
are protected from a second infection” (see 
go.nature.com/3cutjqz). Nonetheless, the idea 
is being floated inthe United States, Germany, 
the United Kingdom and other nations. 
China has already introduced virtual health 
checks, contact tracing and digital QR codes to 
limit the movement of people. Antibody test 
results could easily be integrated into this 
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system. And Chile, in a game of semantics, 
says that it intends to issue ‘medical release 
certificates’ with three months’ validity to 
people who have recovered from the disease?. 

In our view, any documentation that limits 
individual freedoms on the basis of biology 
risks becoming a platform for restricting 
human rights, increasing discrimination and 
threatening — rather than protecting — pub- 
lic health. Here we present ten reasons why 
immunity passports won't, can’t and shouldn't 
be allowed to work. 


Ten points 


Four huge practical problems and six ethical 
objections add up to one very bad idea. 
COVID-19 immunity is a mystery. Recent data? 
suggest that a majority of recovered patients 
produce some antibodies against SARS-CoV-2. 
But scientists don’t know whether everyone pro- 
duces enough antibodies to guarantee future 
protection, what a safe level might be or how 
long immunity might last. Current estimates, 
based on immune responses to closely related 
viruses such as those that cause severe acute 
respiratory syndrome (SARS) and Middle East 
respiratory syndrome (MERS), suggest that 
recovered individuals could be protected from 
re-infection for one to two years. But if SARS- 
CoV-2 immunity instead mimics that seen with 
thecommoncold, the protection period could 
be shorter. 


Serological tests are unreliable. Tests to 
measure SARS-CoV-2 antibodies in the blood 
can bea valuable tool to assess the prevalence 
and spread of the virus. But they vary widely 
in quality and efficacy. This has led the WHO 
and former US Food and Drug Administration 
commissioner Scott Gottlieb to caution against 
their use in assessing individual health or 
immune status. Several available tests are suffi- 
ciently accurate, meaning they are validated to 
have atleast 99% specificity and sensitivity. But 
preliminary data suggest that the vast majority 
aren't reliable‘. Low specificity means the test 
measures antibodies other than those that are 
specific to SARS-CoV-2. This causes false posi- 
tives, leading people to think they areimmune 
when they aren't. Low sensitivity means that 
the test requires a person to have a high con- 
centration of SARS-CoV-2 antibodies for them 
to be measured effectively. This causes false 
negatives in people who have few antibodies, 
leading to potentially immune individuals 
being incorrectly labelled as not immune. 


The volume of testing needed is unfeasible. 
Tens to hundreds of millions of serological tests 
would be needed for a national immunity cer- 
tification programme. For example, Germany 
has a population of nearly 84 million people, 
so would require at least 168 million serologi- 
cal tests to validate every resident’s COVID-19 
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immune status at least twice. Two tests per 
personare the minimum, because anyone who 
tested negative might later become infected and 
would need to be retested to be immune certi- 
fied. Repeat testing, on no less than an annual 
basis, would be necessary to ensure ongoing 
immunity. FromJune, the German government 
will receive 5 million serological tests a month 
from the Swiss firm Roche Pharmaceuticals —a 
leading supplier of one SARS-CoV-2 serological 
test that has been approved by regulators. This 
will allow only 6% of the German population to 
be tested each month. 

Even if immunity passports were limited 
to health-care workers, the number of tests 
required could still be unfeasible. The United 
States, for example, would need more than 
16 million such tests. At the time of writing, 
the US Centers for Disease Control and Preven- 
tion and US public-health laboratories have 
performed more than 12 million diagnostic 
tests for SARS-CoV-2 (3% of the total US pop- 
ulation; see go.nature.com/2wemdd2). Even 
South Korea, acountry with high testing rates, 
had managed to test only 1.5% ofits population 
by 20 May (see go.nature.com/2aztfvp). 


Too few survivors to boost the economy. The 
proportion of individuals known to have recov- 
ered from COVID-19 varies widely in different 
populations. Reports from hot spots in 
Germany and the United States suggest some 
locations could have recovery rates between 
14% and 30%. In New York state, for example, 
where 3,000 people were tested at random 
in grocery shops and other public locations, 


“A cafe can’t open and serve 
customers without risk if 
only a fraction ofits staffare 
certified as immune.” 


14.9% had antibodies against COVID-19 (see go. 
nature.com/2waaku9). But these seem to be 
the exception. Inan April press conference, the 
WHO estimated that only 2-3% of the global 
population had recovered from the virus. 

Low disease prevalence combined with 
limited testing capacity, not to mention 
highly unreliable tests, means that only asmall 
fraction of any population would be certified 
as free to work. Based on current numbers 
of confirmed US cases, for example, only 
0.43% of the population would be certified. 
Such percentages are inconsequential for the 
economy and for safety. A cafe can’t open and 
serve customers without risk if only a fraction 
ofits staff are certified as immune. Ashop can’t 
turna profit if only a minuscule proportion of 
customers are allowed to enter. 


Monitoring erodes privacy. The whole point 
ofimmunity passports is to control movement. 
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Thus, any strategy for immunity certification 
must include a system for identification and 
monitoring. Paper documentation could be 
vulnerable to forgery. Electronic documenta- 
tion integrated into asmartphone app would 
be more resistant to fraud and more effective 
for contact tracing, retesting and updates of 
immune status. 

But electronic documents present a more 
serious risk to privacy®. In some Chinese 
provinces, QR codes onsmartphones control 
entrance into public places on the basis of the 
individual’s COVID-19 health status. However, 
these apps report more than COVID-19 infor- 
mation — including people’s locations, travel 
history, who they’ve come into contact with 
and other health information, ranging from 
their body temperature to whether they’ve 
recently had acold. Taiwanis also using smart- 
phone apps with alert systems that are directly 
linked to police departments. The United King- 
dom, United States and many other countries 
are testing various app options. Yet there’s 
no guarantee that the apps will recede when 
COVID-19 does. China has announced that ele- 
ments of its QR-code tracking system are likely 
to remain in place after the pandemic ends. 


Marginalized groups will face more scrutiny. 
With increased monitoring comes increased 
policing, and with it higher risks of profiling 
and potential harms to racial, sexual, reli- 
gious or other minority groups. During the 
pandemic, China has been accused of racially 
profiling residents by forcing all African 
nationals to be tested for the virus. In other 
parts of the world, people from Asia have faced 
spikes in racialized prejudice. 

Before this pandemic, stop-and-frisk laws 
in the United States already disproportion- 
ately affected people of colour. In 2019, 88% 
of people who were stopped and searched in 
New York City were African American or Latin 
American (go.nature.com/2jntjym). And dur- 
ing the pandemic, policing continues to target 
people from minority groups. Between mid- 
March and the start of May in Brooklyn, New 
York, 35 of the 40 people arrested for violating 
physical distancing laws were black°®. 

These numbers are deeply concerning, 
but would be even more so if monitoring and 
policing for COVID-19 immunity were to be 
used for ulterior motives. For example, ‘digital 
incarceration’ has already increased in coun- 
tries suchas the United States, Brazil and Iran, 
where individuals have been released from 
prisonto minimize the spread of COVID-19 and 
then monitored using digital ankle bracelets. 
In the United States, where people of colour 
are racially segregated by neighbourhood 
and disproportionately incarcerated, digital 
incarceration could be used to monitor large 
segments of certain communities. The risk 
would be even higher if digital monitoring 
were to be linked to immigration status. 
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Health-care workers in Munich, Germany, take blood to test for antibodies to SARS-CoV-2. 


Unfair access. With a shortage of testing, many 
willnot have access. Experience so far suggests 
that the wealthy and powerful are more likely 
to obtain a test than the poor and vulnerable. 
In tiered health-care systems, these inequities 
are felt even more acutely. In early March, for 
example, when professional sports teams, 
technology executives and film celebrities were 
getting tested, dozens of US states were con- 
ducting fewer than 20 tests per day (see https:// 
covidtracking.com/data). The very people who 
need to get back to work most urgently — work- 
ers who need tokeepa roof over their head and 
food on the table — are likely to struggle to get 
an antibody test. Testing children before they 
returntoschool couldbealow priority, as would 
testing retired older people and those who face 
physical, mental-health or cognitive challenges. 


Societal stratification. Labelling people on 
the basis of their COVID-19 status would create 
anew measure by which to divide the ‘haves’ 
and the ‘have-nots’ — the immunoprivileged 
and the immunodeprived. Such labelling is 
particularly concerning in the absence of a 
free, universally available vaccine. If a vaccine 
becomes available, then people could chooseto 
opt in and gain immune certification. Without 
one, stratification would depend onluck, money 
and personal circumstances. Restricting work, 
concerts, museums, religious services, restau- 
rants, political polling sites and even health-care 
centres to COVID-19 survivors would harm and 
disenfranchise a majority of the population. 

Social and financial inequities would be 
amplified. For example, employers wanting 
to avoid workers who are at risk of becoming 
unwell might privilege current employees who 
have had the disease, and preferentially hire 
those with ‘confirmed’ immunity. 

Immunity passports could also fuel divisions 
between nations. Individuals from countries 
that are unable or unwillingto implementimmu- 
nity passport programmes could be barred 
from travelling to countries that stipulate 


them. Already people with HIV are subjected 
torestrictions onentering, living and working in 
countries with laws that impinge on the rights of 
those from sexual and gender minorities — such 
as Russia, Egypt and Singapore. 


New forms of discrimination. Platforms for 
SARS-CoV-2 immune certification could easily 
be expanded to include other forms of personal 
health data, suchas mental-health records and 
genetic-test results. The immunity passports 
of today could become the all-encompassing 
biological passports of tomorrow. These 
would introduce a new risk for discrimina- 
tion if employers, insurance companies, 
law-enforcement officers and others could 
access private health information for their own 
benefit. Such concerns have been catalogued 
over the past few years in debates about who 
should have access to genetic information, as 
demandrises from clinicians, researchers, insur- 
ers, employers and law enforcers, for example’. 


Threats to public health. Immunity passports 
could create perverse incentives. Ifaccess to cer- 
tain social and economic liberties is given only 
to people who have recovered from COVID-19, 
then immunity passports could incentivize 
healthy, non-immune individuals to wilfully 
seek out infection — putting themselves and oth- 
ers at risk®. Economic hardship could amplify 
theincentive ifanimmunity passportis the only 
way to a pay cheque. Individuals might obtain 
documents illicitly, through bribery, transfer 
between individuals or forgery. These could 
create further health threats, because people 
claiming immunity could continue to spread 
the virus. Crises tend to foster nefarious trade, 
as happened during the Second World War when 
food rations in Britain caused the emergence of 
arobust underground exchange system. 


Next steps 


Strategies that focus on the individual — using 
conceptions of ethics rooted in libertarianism 
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— contradict the mission of public health’. They 
distract attention from actions that benefit all, 
such as funding international collaborations, 
practising effective public-health measures and 
redressing income inequity. In North America 
(and elsewhere), because of structural inequi- 
ties, people of colour are dying from COVID-19 
at much higher rates than are white people, and 
the virus is disproportionately affecting those 
who live in First Nations territories. Success 
depends on solidarity, a genuine appreciation 
that weareallinthis together. An ethic premised 
onindividual autonomy is grossly inappropriate 
during a public-health crisis; the overarching 
aim must be to promote the common good. 
Instead of immunity passports, we contend 
that governments and businesses should invest 
available time, talent and money in two things. 
First is the tried and true formula of pan- 
demic damage limitation — test, trace and 
isolate — that has worked well from Singa- 
pore and New Zealand to Guernsey and Hanoi. 
Health status, personal data and location 
must be anonymized. Apps that empower 
individuals to make safe choices about their 
own movements should be prioritized. 
Second isthe development, production and 
global distribution ofa vaccine for SARS-CoV-2. 
If universal, timely, free access toa vaccination 
becomes possible, then it could be ethically 
permissible to require vaccine certification for 
participationin certain activities. Butifaccessto 
avaccineis limited in any way, thensome of the 
inequities we highlight could still apply, as the 
literature on uptake of other vaccines attests”. 
Threats to freedom, fairness and public 
health are inherent to any platform that is 
designed to segregate society on the basis of 
biological data. All policies and practices must 
be guided by acommitmentto social justice. 
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Take lessons from cancer 
evolution to the clinic 


Charles Swanton 


The first long-term study 

of how lung cancer evolves 
is revealing that therapies 
targeting multiple proteins 
intumour cells could help to 
outpace the disease. 


ownon-small-cell lung cancer evolves 
in individual patients is being stud- 
ied in a project called TRACERx 
(Tracking Cancer Evolution through 
Therapy), the first such large-scale, 
longitudinal study to do so. Insights from 
the project so far are compiled this week in 
a collection in Nature (www.nature.com/ 
collections/haffgaicaf). Started in 2014, the 
project aims to follow 840 people being 
treated at 14 UK National Health Service 
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hospitals, from diagnosis through to cure or 
relapse and, tragically, death (see go.nature. 
com/2vkxwdy). Some tantalizing results have 
begun to emerge. These help to explain why 
early diagnosis is so crucial for effective treat- 
ment, as clinicians have noted for decades. 
Most of researchers’ understanding of 
cancer evolution has come from observing 
cellular changes in biopsies of tumours or alter- 
ationsin DNA from tumour cells. Such samples 
are typically taken for diagnosis, and provide 
only snapshots of a complex process. By con- 
trast, TRACERx aims to monitor this process 
as it unfolds. The study compares molecular 
changes in the responses of people’s tumour 
cells and in immune cells, across multiple 
tumour regions and as the tumour progresses. 
Scientists in the TRACERx project, for which 
lam chief investigator, hope that clinical prac- 
tice might soon benefit from the insights 
emerging, combined with those from previous 
studies on cancer evolution. As well as explain- 
ing a person’s immune response against their 
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cancer, the study could guide treatments for 
targeting the many genetic mutations that 
accumulate in late-stage cancers that are dif- 
ficult to treat. More than 80% of people who 
are diagnosed with stage-four lung cancer do 
not survive for more than five years. 


Project approach 


Before the COVID-19 pandemic, the TRACERx 
project had recruited 760 people with 
early-stage lung cancer. After a person is 
diagnosed with a primary lung tumour, it is 
surgically removed and the cells are analysed 
to reconstruct the tumour’s evolutionary 
history. Each individual receives a computed 
tomography (CT) scan every year for five years 
to check whether their cancer has returned. If 
there is no sign of relapse, they are discharged 
and deemed to have been cured. People with 
later-stage tumours (stages 2 and 3) are offered 
chemotherapy following surgery to improve 
the chance of remission or cure. 

Analysis of tumours from the first 100 people 
enrolled inthe study has revealed many genomic 
changes. These include chromosome deletions 
and duplications, and even the doubling of 
whole genomes in nearly three-quarters of 
tumours — a feature of many cancers’. Point 
mutations in DNA, arising from single changes 
inthe genome sequence, were also prevalent. 
These occurred as a result of tobacco exposure 
and the activity of enzymes called cytidine 
deaminases, which normally deactivate invad- 
ing viruses as part of the immune response. 

Another finding is that whole-genome 
doubling often occurs early on in those with 
lung cancer who have a history of smoking". 
This doubling seems to protect the genes 
needed for the tumour’s survival in the face 
of the excessive mutations and chromosomal 
losses that occur in its genomeas it develops”. 
Strikingly, mutations induced by smoking 
tend to dominate the ‘trunk’ of the tumour’s 
evolutionary tree. These are known as founder 
or truncal mutations, and are present in every 
tumour cell. For the most common type of non- 
small-cell lung cancer — adenocarcinomas 
that form in mucus-secreting glands — the 
number of smoking-related mutations in the 
trunk correlates with the number of cigarettes 
that the person has smoked. As the cancer 
advances, cytidine deaminases cause haphaz- 
ard mutations to accumulate in some cells; we 
refer to these as branched mutations”. 

With a view to devising immunotherapy’, 
we also investigated the DNA sequences of 
receptors on T cells, a type of white blood cell 
that fights off infection and emerging can- 
cers. We were surprised to discover that the 
sequences of T-cell receptors evolved in par- 
allel with the tumour. (The tumour eventually 
adapts to its immune environment and avoids 
destruction**). One possible explanation for 
the tumour’s ability to evade the immune sys- 
tem is the chromosomal instability of tumour 
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cells. This causes the tumour cells to lose 
immune-recognition molecules (called HLA, for 
human leukocyte antigen) on their surface, pre- 
venting T cells from homing in onthe tumour*°. 
This unstoppable genetic diversification 
can outwit cancer drugs. So our aimis to boost 
the immune response to target the mutations 
present in every tumour cell and prevent the 
development of drug resistance. 


Steer for clinicians 


How might the discoveries from TRACERx and 
previous studies on cancer evolution guide 
clinical practice? 

The overriding message is that it will take 
multiple approaches to outpace cancers that 
have sophisticated evolutionary mechanisms, 
such as lung cancer. Each approach will have 
to focus on different stages of the disease to 
improve outcomes. 

For example, lung-cancer screening should 
be rolled out nationwide: earlier diagnosis 
improves prognosis. In the United Kingdom, 
lung-cancer screening so far occurs inonlya few 
regional centres. On the basis of results froma 
clinical trial in the Netherlands, I calculate that 
screening could prevent approximately 3,000- 
5,000 deaths from lung cancer in the United 
Kingdom every year (unpublished work). 

In the Netherlands trial, 13,195 male and 
2,594 female former or current smokers aged 
50 to 74 were split at random into two groups. 
One group received CT screening when they 
were diagnosed and at years 1, 3 and 5.5; the 
other received no screening. After 10 years of 
follow up, there were 2.5 deaths per 1,000 per- 
son years in the screened group’. (Put another 
way, this means there would be, on average, 
2.5 deaths from lung cancer in the screened 
group if 1,000 people in the Netherlands were 
observed for one year.) There were 3.3 deaths 
per 1,000 person-years in the control group’. 
This equates to a 24% reduction in the chance of 
death from lung cancer if CT screening is done. 

Treatments that are tailored to an individu- 
al’s needs can also improve their prognosis. For 
example, some people benefit from chemother- 
apy after surgery to destroy any residual tumour 
cells following removal of the primary tumour. 
For every 100 people who are given chemo- 
therapy after surgery, between 5 and 15 extra 
patients will be alive (depending on tumour 
stage) after 5 years, compared with those who do 
not receive chemotherapy’. Currently, doctors 
cannot predict who those individuals might be. 

The TRACERx study has come up with a 
sensitive method to predict who is most likely 
to relapse following surgery — by identifying 
truncal DNA mutations that are common to 
all lung-cancer cells in a patient’s tumour. 
This tumour DNA appears in the blood as 
circulating free DNA after surgery — a sig- 
nal of minimal residual disease. Our pat- 
ented sequencing approach seeks residual 
tumour DNA harbouring these truncal 


mutations’ and is now ready for clinical tri- 
als (see competing-interests declaration at 
go.nature.com/2w214yu). These trials could 
test the hypothesis that post-operative chemo- 
therapy is beneficial only to patients who have 
tumour DNA in their blood after surgery, and 
therefore have minimal residual disease. 

The TRACERx study also offers insight into 
how immunotherapies for lung cancer could 
be refined. Currently, the malignant cells 
that some cancer therapies target carry one 
mutant protein on their surface that drives 
tumour expansion, such as a growth-factor 
receptor that is jammed in the ‘on’ position. 
In most people, cancer cells become resist- 
ant to such therapies after 18 months, through 
acquired resistance mutations in the receptor 
or the downstream pathway. Knowing that the 
immune system adapts rapidly intandem with 
the tumour, clinicians could now target not 
just one but many mutant cancer-cell pro- 
teins that are present in every tumour cell, 
and which occur early in tumour evolution 
(truncal mutations). This might decrease the 
chances of resistance developing, and could 
have the advantage of not harming normal tis- 
sue. It might also provide a way to keep abreast 
of complex, fast-evolving cancers. 

Clinical trials are now exploring how to 
exploit these truncal mutations, by extracting 
the T cells that recognize them from a patient 
with cancer, culturing the cells inthe laboratory 
and transferring them back into the patient to 
amplify his or her immune response against 
the truncal mutations. Other UK researchers 
hope to assess the effectiveness of these cells 
in limiting tumour growth and spread (see 
competing-interests declaration). 


Future challenges 


Despite the promise set out here, many 
scientific and clinical challenges remain when 
it comes to cancer evolution. 

TRACERx and other studies have revealed 
the complex interactions between malig- 
nant tumour cells and normal cells in the 
surrounding environment”. They can even 
come to depend oneach other for survival and 
growth. Our understanding of these dynamics 
is still rudimentary. To assess them over space 
and time, a patient’s tumour would need to be 
sampled frequently, which is unethical and 
not possible. The TRACERx project includes 
a national autopsy study called PEACE that 
helps to resolve this problem and sample more 
deeply (see go.nature.com/2kutvyr). 

Scientists on the PEACE study are analysing 
the tumour material of 20 patients from 
TRACERx who gave consent for their tissues 
to beused in research after their death. Workin 
progress is beginning to reveal how cancer cells 
spread from the primary tumour to distant 
sites in the body. TRACERx is also illuminat- 
ing how cells that are shed from the tumour at 
surgery (circulating tumour cells) reflect the 
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future site of spread months later". Develop- 
ments in single-cell sequencing technologies 
will enhance our understanding of this process. 

Animal models that are used to probe and 
manipulate cancer evolution can mimic cer- 
tain key features, but they do not completely 
mirror human disease. Mouse lung-cancer 
cells, for instance, carry just 1% of the muta- 
tions found in cancer cells of people who 
smoke or have smoked”. There are no mouse 
lung-cancer models for tumour mutations that 
arise from the enhanced activity of cytidine 
deaminase. Moreover, in mice, the immune 
mechanisms that evade cancer cannot yet be 
effectively manipulated in parallel with an 
evolving human tumour that has many muta- 
tions. Researchers need to develop mouse 
models that better mimic how human lung 
cancers evolve and evade the immune system. 
Gene-editing tools such as CRISPR-Cas9 could 
help here. Such models would be valuable for 
drug development and preclinical testing. 

Approaches are beginning to emerge to 
image ‘normal’ cells in the microenvironment 
of agrowing tumour; some of these techniques 
are being used by the TRACERx consortium. 
Advances in sequencing nucleic acids and pro- 
teins from single cells — preferably performed 
insitu without disrupting the tumour architec- 
ture — should help to decipher details of the 
myriad steps in cancer evolution”. 

Multiple efforts worldwide provide hope 
for the future of lung-cancer treatment. These 
include programmes to minimize tobacco use, 
more CT screening to detect early lung cancers, 
clinical trials focusing on minimal residual dis- 
ease, and developments in immunotherapy for 
advanced tumours. This combination holds 
much promise for improving the survival and 
quality of life for patients with lung cancer. 
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Readers respond 


Correspondence 


Rwanda’s success in 
tackling COVID-19 


Rwanda’s strong health-care 
system and strictly coordinated 
prevention measures against 
COVID-19 have helped the 
country to record zero deaths 
from the disease so far. As the 
pandemic threatens to gather 
momentum in Africa, other 
governments there could benefit 
from lessons we have learnt. 

Rwanda implemented 
full lockdown a week after 
its first case was reported in 
mid-March. A week later, it set 
up acontact-tracing system and 
implemented testing for all staff 
policing borders, as well as those 
working in public spaces such 
as banks and bars. By the end of 
April, 29,395 citizens had been 
tested for COVID-19 (prevalence 
was 0.7%). The nation’s 
community health network has 
enabled the government — with 
help from the private sector — to 
identify populations in need of 
extra support. 

Africa has so far recorded 
relatively few cases and deaths 
compared with other continents 
(https://covid19.who.int). Strict 
prevention measures that are 
coordinated across countries 
could keep it that way. Regional 
bodies such as the East African 
Community should agree 
guidelines for full lockdown, 
backed by surveillance and 
a supranational testing 
laboratory, and follow up with 
population-impact surveys for 
mental health and COVID-19 
serological status. 


Jeanine Condo University of 
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Developing world: 
boost modelling 


Computational models 

of the likely spread of the 
SARS-CoV-2 coronavirus have 
been instrumental in guiding 
governments strategies to 
limit disease transmission and 
control the current public- 
health crisis. In developing 
countries, where the pandemic 
is potentially at its most 
dangerous and costly, we call 
for governments to work with 
academic institutions to build 
and sustain modelling capacity. 

Models are not silver bullets 
for fixing the ills of developing 
countries. Nevertheless, 
partnerships with international 
academic modelling 
communities (see, for example, 
F. Squazzoniet al. J. Artif. Soc. 
Soc. Simul. 23, 10; 2020) and the 
participation of stakeholders 
and experts from different 
disciplines could help to build 
useful models. 

Such models would combine 
knowledge of computational 
techniques with local 
contextual knowledge of social 
processes. They would enable 
policymakers to distil choices 
from uncertainties, particularly 
when stakes are high and 
resources limited. Once set up, 
they could be used in times of 
both crisis and calm. 


Kaveri lychettira, Afreen Siddiqi 
Belfer Center for Science and 
International Affairs, Harvard 
Kennedy School, Cambridge, 
Massachusetts, USA. 
kaveri_iychettira@hks.harvard.edu 
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Spotlight on figures 
for COVID-19 


Given the importance of 
accurate reporting of COVID-19 
cases and deaths to strategies 
for virus control, we screened 
countries’ daily records for 
possible misreporting by 
applying Benford’s law. This 
can pick up unreliable numbers 
resulting from error, oversight 
or manipulation, for instance — 
although it cannot distinguish 
their possible causes. 

Benford’s law predicts the 
relative frequency distribution 
of first digits in real-world 
number sets (see M. Sambridge 
et al. Geophys. Res. Lett. 37, 
L22301, 2010). Anomalies have 
exposed financial fraud, for 
example (see M.J. Nigrini/. Am. 
Tax Assoc. 18, 72-91; 1996). 

We tested data reported by 

51 countries from 16 January 
until 9 April 2020, when case 
numbers were still on the rise. 
Once these start to level out, 
as was the case for China and 
South Korea over that period, 
Benford’s law can no longer be 
applied. 

We found that records of 
cumulative infections and 
deaths from the United States, 
Japan, Indonesia and most 
European nations adhered 
well to the law (see go.nature. 
com/2kqtut2) and therefore 
are consistent with accurate 
reporting. Figures from a few of 
the countries analysed reveal 
anomalies. These could be 
explained by limited data sets 
or by adjustments to avoid 
headline-grabbing numbers 
of deaths in the hundreds or 
thousands. 


Can public trust 
coronavirus apps? 


On 6 April, we approached 
the Belgian government 
with concerns about the 
improper use of contact- 
tracing smartphone apps for 
controlling pandemics. These 
concerns were in line with those 
you discuss (Nature 580, 563; 
2020). On 17 April, we drew its 
attention to other issues relating 
to lockdown exit strategies. 

We argued that contact- 
tracing apps could 
complicate rather than 
facilitate lockdown exit (see 
go.nature.com/36ebfmq). For 
example, receiving (or not) a 
warning through the app might 
elicit a false sense of security, 
or drive demand for testing 
that might not be available. And 
there is more at stake than the 
government’s public-health- 
efforts and investment: an 
app’s success also depends on 
personal, public and social trust. 

Governments need to engage 
stakeholders to co-design 
the app so that it aligns with 
local culture and connects 
with vulnerable populations. 
They also need to use proper 
information campaigns and 
human follow-up after issuing 
app warnings, and to ensure that 
the media accurately relay what 
the apps can and cannot deliver. 

For now, the Belgian 
government has paused its 
implementation of contact- 
tracing apps (see go.nature. 
com/2zinmbb). If they pursue 
the project, we hope it will 
incorporate the necessary 
caution and guidance for 
citizens. 
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Ability to understand 
genomes scales up 


Deanna M. Church 


A massive genome-sequencing and analysis effort has 
produced the most comprehensive sets of data and tools for 
understanding human genetic variation so far. The resource 
will be invaluable to biologists of every stripe. See p.434, 


p.444, p.452 & p.459 


What do the differences between each person’s 
genetic code mean for their individual devel- 
opment and health? Several factors have 
hampered researchers’ ability to answer this 
question. First, understanding genetic variation 
requires analysing huge numbers of sequences, 
because we carry many rare variants’. Most of 
these have no effect, with just a few causing 
genetic diseases. Second, most of our under- 
standing of genetic variation has come from 
studying single nucleotide variants (SNVs), but 
structural variants — more than 50 nucleotides 
long — can havea larger impact on physiological 
traits, and are major contributors to disease’. 
Third, we lack an understanding of variation 
outside protein-coding sequences. In four 
papers in Nature, the genome aggregation 
database (gnomAD) consortium? sets out to 
address these gaps in knowledge. 

The gnomAD project is the successor to 
the game-changing exome aggregation con- 
sortium (ExAC)’ project, which catalogued 
genetic variation in the protein-coding parts 
of the genome, called exomes, from more than 
60,000 people (Fig. 1). EXAC set a new stand- 
ard for harmonized analysis — bringing in data 
from diverse projects for reanalysis in a com- 
mon pipeline — and for data sharing. The ExAC 
data were available to scientists well before 
the project’s publication in 2016, and it has 
had a profound impact on how researchers, 
physicians and genetic counsellors interpret 
the genomes of people with genetic diseases. 

Inthe first of the current papers, Karczewski 
et al.’ describe the gnomAD consortium’s col- 
lection of 125,748 exomes and 15,708 whole 
genomes. The move to sequencing whole 
genomes is especially exciting, because 
analysis of non-coding sequences provides 
information about both structural variation 


and variation in DNA sequences that regulate 
gene expression — described in the compan- 
ion papers. The gnomAD resource includes 


“Theimpact that the project 
willhave on science goes 
well beyond the current 
collection.” 


sequences from diverse populations, includ- 
ing individuals from Asia and Africa. However, 
as the authors note, representation from more 
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60,708 
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diverse populations is still needed to obtain 
the full spectrum of human variation and to 
capture more population-specific variation. 

Karczewski et al. went on to analyse the 
protein-coding variants in their data set. 
They built on a metric developed by the 
ExAC group to assess whether a gene can 
‘tolerate’ variants that are predicted to pre- 
vent the normal functions of the protein it 
encodes — that is, whether these predicted 
loss-of-function (pLoF) variants have little 
to no effect on physiology, or cause serious 
health issues or death. This type of analysis 
is useful because genes that are intolerant to 
LoF might be essential for life, or their muta- 
tion could cause genetic diseases. 

The ExAC metric measures how many 
pLoF variants are observed in a gene across 
a population, compared with how many are 
expected given the rate at which mutations 
arise in genomes throughout evolution. 
However, because pLoF variants are so rare, 
60,000 exomes was not enough to definitively 
say whether all the genes studied — particu- 
larly small genes — are intolerant to pLoF. The 
data were therefore expressed as the proba- 
bility that a given gene would tolerate pLoF. 

By contrast, the increased cohort size in 
gnomAD allows for a more-direct measure 
of gene tolerance to LoF. Karczewski et al. 
binned genes into ten groups according to 
the frequency of pLoF variants they contained 
compared with those expected, producing a 
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Figure 1 | Cataloguing genetic variation in humans. a, In 2016, the exome aggregation consortium 
(ExAC) catalogued sites in the protein-coding sequences (the exomes) of the human genome at which single 
nucleotides could vary between individuals’. The database was formed of exomes from 60,708 people. 

b, Its successor, the genome aggregation database (gnomAD) includes 15,708 whole-genome sequences, in 
addition to 125,748 exomes?°. The consortium catalogued not only single-nucleotide variants (SNVs) across 
the whole genome, but also more-complex structural variants, which span 50 nucleotides or more. These 


can include deletions, inversions or duplications of DNA. 
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spectrum of pLoF tolerance. The larger sample 
size means that gene length is less of a prob- 
lem in the gnomAD analysis, but even so, the 
authors could not definitively assess pLoF fre- 
quency in the 30% of genes that were expected 
to have few pLoF variants. 

Despite this limitation, the group use their 
approach to gain fresh insights into the genet- 
ics of disease. For example, they found rare 
variants in genes that do not tolerate LoF 
more often in people who have an intellec- 
tual disability or autism spectrum disorder 
than in people who do not. These data might 
help researchers to understand the complex 
genetic structure that underlies these traits. 

In the second paper of the collection, 
Cummings et al.* investigated why genes 
that seem intolerant to pLoF can sometimes 
carry these variants with apparently little con- 
sequence. Genes can be transcribed in differ- 
ent ways, with some protein-coding regions 
(exons) expressed only ina limited fashion. 
Cummings and colleagues demonstrated that, 
when an individual carries a pLoF variant in 
an ‘intolerant’ gene, the variant is often in an 
exon that shows this restricted expression, 
thus limiting its effect. 

In the third paper, Minikel et al.° assessed 
how the pLoF database might improve our 
ability to identify genetic targets for drugs. 
The identification of individuals who carry 
two pLoF variants in a given gene is desirable 
in drug discovery — if these individuals also 
exhibit a change in a particular trait, it pro- 
vides evidence that the gene could be a good 
drug target®. The group showed that there 
are still many errors when identifying pLoF 
variants; that quality control is needed when 
identifying these variants; and that instances 
of an individual carrying two pLoF variants in 
the same gene are sufficiently rare that we will 
need cohorts roughly 1,000 times bigger than 
gnomAD to gather definitive evidence of their 
existence in most genes. 

One of the most exciting aspects of the 
gnomAD project is the production of a cata- 
logue of structural variants, described in the 
final paper by Collins and co-workers®. There 
have been excellent efforts at cataloguing 
structural variants using long-read sequencing 
technology’. However, sample sizes have been 
small, owing to the expense and lack of stand- 
ardized analysis pipelines for this approach — 
although I expect this situation to improve in 
the near future. By contrast, identifying struc- 
tural variants in short-read sequences is techni- 
cally challenging, because the variants are often 
larger than a typical short sequence read, and 
they can arise through a variety of mutational 
mechanisms, resulting in many variant types 
(duplication, deletion or inversion of DNA, for 
instance) that each leave different footprints in 
the genome. This has led to the development of 
many tools for identifying structural variants 
from short reads, but no ‘standard’ pipeline. 
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Collins etal. sought to remedy this problem 
by creating a pipeline that allows for harmo- 
nized analysis over thousands of genomes; 
this could become the industry standard 
for structural-variant detection from short- 
read sequences on a population scale. The 
authors generated a catalogue of more than 
300,000 high-quality structural variants — 
more than twice as many as previous analyses. 
They then began to assess the contribution 
of structural variants to physiological traits. 
This analysis revealed some evidence for 
natural selection against structural vari- 
ants in non-coding sequences that control 
gene expression. Unsurprisingly, selection 
against structural variants was stronger in 
protein-coding regions. This suggests that 
more variation is tolerated in non-coding 
than in coding regions, and that even-larger 
cohorts (or other approaches) will be needed 
to begin to robustly dissect non-coding vari- 
ation. The authors also found that structural 
variants account for roughly one-quarter of 
protein-truncating events. 

The routine analysis of structural variants, 
integrated with analysis of SNVs and gene 
expression, will be crucial for interpreting 
individual genomes. Collins et al. have taken 
an important step in this direction, and the 
gnomAD resource provides tools for others 
to continue on this path. 

An interesting, recurring theme in these 
papers is that — despite the size of the cohort 
— we still lack the numbers required for many 
analyses. The sequencing of ever-larger 
cohorts should no doubt continue. However, 
this approach alone will not enable us to fully 
understand the relationships between human 


Materials science 


genetics and traits at both cellular and organ- 
ismal levels. We need scalable approaches to 
program genetic variation into human cells, 
and well-characterized cellular traits that can 
be monitored to allow us to directly interro- 
gate the physiological impact of this variation. 
Such interventional biology will substantially 
augment population genetics and accelerate 
our understanding of human biology. 

The gnomAD consortium has already made 
its data publicly available. The impact that the 
project will have on science goes well beyond 
the current collection, which includes not only 
the papers in this issue, but several published 
in Nature's sister journals (go.nature.com/2zg- 
fxr2). The gnomAD resource, like EXAC before 
it, will change how we interpret individual 
genomes. The consortium’s work has revealed 
how much information about human variation 
we had been missing and has provided tools 
that help us to better understand the genome 
at both the population and individual level. | 
can’t wait to see what comes next. 
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Amodel of perfection for 
light-activated catalysts 


Simone Pokrant 


Efforts to make hydrogen from water directly using sunlight 
have been hampered by the inefficiency of the catalysts that 
promote the process. A model system demonstrates that 
almost perfectly efficient catalysts can be made. See p.411 


Since the emergence of Greta Thunberg’s 
‘Fridays For Future’ movement in August 
2018, the need to prevent climate change and 
to find ‘green’ alternatives to fossil fuels have 
become topics of broad public interest. But 
although public awareness has advanced rap- 
idly, progress in the search for cost-effective 
technological solutions has not. One prom- 
ising sustainable energy carrier is hydrogen, 
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if it can be produced using renewable energy 
sources — hydrogen is a green fuel, because 
its combustion produces only pure water. On 
page 411, Takata et al.' report a breakthrough 
in catalyst design that might accelerate the 
development of large-scale processes for 
making hydrogen from water using sunlight. 
The largest potential source of renewable 
energy is the Sun”: about 0.02% of the solar 


energy absorbed by Earth’s surface annually 
would be enough to cover current global 
energy consumption. Many approaches for 
converting solar energy into the chemical 
energy stored in hydrogen are therefore being 
investigated, using ‘water-splitting’ reactions 
in which water is broken down into hydrogen 
and oxygen. Some of these approaches are 
already being tested in pilot facilities — suchas 
solar-power-to-gas units, in which electricity 
produced by solar cells is used to split water 
through electrolysis*. Hydrogen produced in 
this way could be used for the long-term stor- 
age of solar energy, or as fuel for vehicles. 
However, solar-to-gas conversion processes 
are not yet economically viable. 

A study of the technical and economic fea- 
sibility of solar-energy production* has shown 
that systems based on light-activated catalysts 
(photocatalysts) are attractive alternative 
options for water splitting. In these systems, 
photocatalytic semiconductor particles are 
suspended in a bed filled with an aqueous 
electrolyte; when sunlight shines on the 
suspension, hydrogen and oxygen gases are 
produced. The technical simplicity of this 
approach should enable economically com- 
petitive hydrogen production, if the photo- 
catalyst can convert solar energy to hydrogen 
with a minimum efficiency of about 10%. 

However, the conversion efficiencies of 
photocatalytic semiconductors are typically 
much lower than 10%. This is because the 
photocatalytic process is highly complex 
and requires the semiconductor particles to 
haveacombination of several properties. They 
must: absorb light; generate and separate elec- 
tron-hole pairs (holes are positively charged 
quasiparticles produced when photons knock 
electrons out of an atomic lattice); enable 
holes and electrons to travel to the particle— 
water interface; and catalyse the production 
of hydrogen and oxygen from water (reactions 
that require electrons and holes, respectively). 
Side processes that can occur at each step can 
lower the overall conversion efficiency. Mate- 
rials scientists are therefore trying to design 
photocatalysts that minimize such efficiency 
losses. 

A key measure of the effectiveness of a 
photocatalyst is the fraction of absorbed 
photons that it can use to produce hydrogen, 
a quantity called the internal quantum effi- 
ciency (IQE). A perfect photocatalyst that con- 
verts all ofthe absorbed photons to hydrogen 
would have an IQE of 1 (or 100%). However, IQE 
cannot be determined from experiments. 

Arelated quantity that can be experimen- 
tally determined for a reaction is the external 
quantum efficiency (EQE): the fraction of 
photons illuminating the reaction vessel that 
the photocatalyst can use to produce hydro- 
gen. This value is always lower than the IQE, 
because an unknown portion of the illumi- 
nating photons will not be absorbed by the 
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Figure 1|A photocatalytic particle engineered for water splitting. Particles of light-activated catalysts 
(photocatalysts) can be used to drive water splitting — the reaction in which water is broken down into 
hydrogen and oxygen gases. Takata et al.' prepared photocatalysts made of highly crystalline strontium 
titanate that contains a small number of aluminium atoms. When the particles are suspended in water and 
irradiated with ultraviolet light, electrons and holes (positively charged quasiparticles) are produced, 

and travel to different facets of the crystalline particle. The authors selectively deposited appropriate 
co-catalysts on the facets to promote hydrogen production (using the electrons) at the electron-collecting 
facets, and oxygen production (using the holes) at the hole-collecting facets. The selective migration of 
electrons and holes to different reaction sites contributed to the almost perfectly efficient conversion of 


light to hydrogen molecules by the photocatalyst. 


photocatalyst, but will instead be lost to other 
processes, suchas scattering. If similar photo- 
catalyst-particle suspensions are investigated 
using the same experimental set-up, ensuring 
that the same fraction of light is absorbed, 
then EQE can be used as an indirect measure of 
IQE. But EQEs determined using different set- 
ups cannot be used as a way of comparing IQEs 
of photocatalytic systems, because the rela- 
tionship between EQE and IQE is different for 
each set-up — therefore making it difficult for 
different research groups to compare results. 


“This combination of 
complex mitigation 
strategies proved highly 
successful.” 


Takata et al. focus on strontium titanate 
(SrTiO,) — one of the first materials found to 
split water photocatalytically, as reported? in 
1977. Strontium titanate produces electron- 
hole pairs by absorbing ultraviolet light. 
Because the Sun’s intensity is highest in the 
visible-light range, it is unlikely that UV-driven 
catalysts will enable sustainable hydrogen pro- 
duction ona large scale. However, strontium 
titanate is an excellent model system for study- 
ing the influence of photocatalyst parameters 
on quantum efficiency (both EQE and IQE), 
because the mechanisms that cause efficiency 
losses in this material are well understood, and 
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strategies for mitigating the losses have been 
proposed. 

The authors used a combination of 
approaches to address specific loss mech- 
anisms. One such mechanism is charge 
recombination, a process in which electrons 
and holes recombine before they can take 
part in water splitting. Takata and colleagues 
suppressed recombination in several ways. 
The first approach was to improve the crystal- 
linity of the photocatalyst particles, thereby 
reducing the number of lattice defects. 
Another method was to reduce the number 
of chemical defects in the crystal lattice using 
aluminium doping — a process in which small 
quantities of aluminium atoms are incorpo- 
rated into the lattice. These two approaches 
work because any defect (a lattice defect or a 
chemical defect) can act as a potential centre 
for recombination’. 

Takata and colleagues also took advantage 
of the fact that electrons and holes in their 
strontium titanate crystals collect at differ- 
ent crystal facets — a feature that further sup- 
presses charge recombination. The authors 
selectively deposited appropriate co-catalysts 
on the facets to promote hydrogen produc- 
tion at the electron-collecting facets, and 
oxygen production at the hole-collecting 
facets (Fig. 1); this approach was previously 
proposed’ and developed? by other research 
groups. Finally, the authors prevented an 
unwanted side reaction (the oxygen-reduction 
reaction) by encasing the rhodium co-catalyst 
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for the hydrogen-producing reaction in a 
protective shell of achromium compound. 

This combination of complex mitigation 
strategies proved highly successful: the 
authors reported EQEs of up to 96% when their 
photocatalysts were irradiated with light in the 
wavelength range of 350-360 nanometres. 
This is excellent news, because it means they 
have designed an almost perfect photocatalyst 
— the IQE must be between 96% and 100%. 

This is a spectacular result for several 
reasons, even though strontium titanate is ‘just’ 
amodel system for visible-light photocatalysts. 
First, it demonstrates that experiments can 
be designed in which EQEs come close to IQEs 
within an acceptable error margin of less than 
4%. Improved experimental set-ups in which 
measured EQEs are very near to IQEs should 
facilitate the comparison of photocatalysts 
and therefore accelerate progress in this field. 

Second, it proves that the combination 
of design strategies used by the authors can 
indeed eliminate efficiency losses associated 
with recombination. It is to be expected that 
the strategies used to improve the efficiency 
of strontium titanate will also apply to photo- 
catalysts driven by visible light — and could 
therefore enable the conversion of solar energy 
to hydrogen with efficiencies of about 10%. 


Molecular biology 


Finally, and most importantly, Takata and 
colleagues’ findings will inspire and encourage 
other researchers to continue their work on 
photocatalysts. One of the authors of the work, 
Kazunari Domen, published his first paper’ on 
the use of strontium titanate as a photocatalyst 
in 1980. This shows the timescale needed for 
success in this area. Although we do not yet 
have a route for the sustainable and econom- 
ically viable production of hydrogen, we stand 
a good chance of finding one in the next few 
decades. This paper vouches for it. 
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Evolution ofa 


molecular machine 


Michael Berenbrink 


The multi-subunit protein haemoglobin relies on complex 
interactions between its components to function properly. 
Analysis of ancient precursors suggests that its evolution 
from asimple monomer involved only a few steps. See p.480 


The oxygen-transporting protein haemo- 
globin has undergone repeated adaptations 
as animals evolved to conquer new environ- 
ments — from the depths of the oceans'to high 
mountain ranges’. These adaptations relied 
on changes in the long-range interactions 
between oxygen-binding sites buried in the 
protein’s subunits, and between these regions 
and binding sites for a multitude of small effec- 
tor molecules on the protein’s surface?. How 
did this complex molecular machine, which 
can respond so exquisitely to available levels 
of both oxygen and several other effector 
molecules, come into being? On page 480, 
Pillai etal.* reconstruct the stepwise evolution 
of haemoglobin from precursors that existed 
more than 400 million years ago. 

Almost nothing was previously known about 
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how the four-subunit (tetrameric) form of 
haemoglobin that is found in modern-day 
jawed vertebrates evolved from ancient mono- 
mers. Tetrameric haemoglobin consists of two 


“Pillai and colleagues’ work 
serves as one of the clearest 
examplesso far of how such 
complexity can arise.” 


a- and two B-subunits. Pillai et al. computa- 
tionally reconstructed an evolutionary tree to 
chart the protein’s ancient history, using the 
amino-acid sequences of a large collection of 
the closely related vertebrate globin proteins, 
which exist as either monomers or tetramers. 
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The authors’ tree was constructed taking into 
account that amino-acid substitutions a given 
protein shares with close relatives tend to have 
originated in more-recent common ancestors 
than have those it shares with more-distant 
relatives. The reconstructed evolutionary 
tree indicates that multiple rounds of gene 
duplication and subsequent divergence 
gave rise to the globin family and, by way 
of several ancestral proteins, to tetrameric 
haemoglobin (Fig. 1). 

What is special about the study is that Pillai 
and colleagues went onto resurrect several of 
these extinct ancestral proteins, generating 
them from the amino-acid sequences pre- 
dicted by the tree. The group then tested these 
proteins’ functions. 

First, Pillaiand colleagues analysed whether 
each ancestral protein could form dimers 
and tetramers of like or unlike subunits. The 
earliest protein — a common ancestor of 
haemoglobin and the monomeric globin pro- 
tein myoglobin, named AncMH by the authors 
— exists only as amonomer. A later protein, 
named Anca/B, which is the ancestor of all 
existing haemoglobin subunits, forms homo- 
dimers when expressed at high levels. The 
authors’ tree indicates that Anca/B underwent 
gene duplication to produce two proteins: 
the ancestors of all existing a- or B-subunits, 
which the group respectively named Anca and 
Ancf. These proteins also form homodimers, 
or even homotetramers, when expressed 
alone. However, when the two are expressed 
together in equal proportions, they can form 
heterodimers, which then further align to yield 
haemoglobin tetramers. 

The group next investigated the oxygen- 
binding affinity of the ancestral proteins, 
along with their oxygen cooperativity (the 
ability of oxygen-binding subunits to interact 
with one another) and their ‘allosteric’ regula- 
tion by a potent, artificial effector molecule, 
inositol hexaphosphate (IHP). They found 
that only Anca and AncB — when expressed 
together at high concentrations — show simi- 
lar oxygen-binding affinity, cooperativity and 
allosteric regulation to today’s haemoglobin 
protein. These features are shared by all living 
jawed vertebrates, but are absent or achieved 
ina different way injawless vertebrates, whose 
haemoglobin proteins are of more ancient ori- 
gin. This indicates that the basic functions of 
jawed-vertebrate haemoglobin had already 
evolved inacommon ancestor of these animals 
but at some time after the split with jawless 
vertebrates. 

Next, Pillai et al. modelled the stepwise 
changes in a- and B-subunit interfaces that 
might have allowed Anca and Ancf first to 
form heterodimers with one another, and later 
heterotetramers from pairs of such dimers. 
The modelling indicated that strikingly few 
amino-acid substitutions might have been 
needed to transform a simple monomeric 
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Figure 1 | Key steps in the evolution of the tetrameric haemoglobin protein. In jawed vertebrates, 
haemoglobin exists as a tetramer, formed from two a- and two B-subunits. Pillai et al. resurrected extinct 
ancestors of haemoglobin using predicted amino-acid sequences to reconstruct the protein’s evolution. 
The authors showed that the protein AncMH, the last common ancestor of haemoglobin and the related 
protein myoglobin, existed as a monomer. Duplication of the gene that encoded AncMH, and subsequent 
divergence into two genes, produced monomeric myoglobin and the ancestor of haemoglobin, Anca/B, 
which forms a homodimer. Further gene duplication of Anca/B and subsequent divergence yielded the 
ancestors of the a- and B-subunits, dubbed Anca and Ancf. These two subunits evolved an interface that 
allowed the formation of heterodimers. A few further changes in amino-acid residues generated a second 
interface that allowed the assembly of modern-day a- and B-subunits into haemoglobin heterotetramers. 


oxygen-binding protein similar to myoglobin 
(whose oxygen binding is non-cooperative and 
almost totally unaffected by allosteric effector 
molecules>) into haemoglobin. Importantly, 
the researchers verified the results of their 
model by generating proteins that carried 
mutations of the amino-acid residues iden- 
tified, and showing that heterotetramer for- 
mation was disrupted. 

The authors’ work shows how natural 
selection, acting on pre-existing biophysical 
protein properties, can, injust a few evolution- 
ary steps, create multimeric structures that 
have complex functions. Most cellular pro- 
cesses involve the action of protein multimers, 
and Pillai and colleagues’ work serves as one 
of the clearest examples so far of how such 
complexity can arise during protein evolution. 

There are inevitable uncertainties in these 
kinds of reconstruction of the deep past of 
life, because the accuracy of such reconstruc- 
tions relies on the proteins under considera- 
tion having several specific properties®. The 
proteins should, ideally, show small overall 
rates of amino-acid sequence divergence from 
one another, have thoroughly known and 
well-supported evolutionary relationships 
to each other, and exhibit a dense evolution- 
ary branching pattern over the time period(s) 
of interest. Finally, a detailed knowledge of 
structure-function relationships is essential. 
It would be difficult to reconstruct with any 
confidence ancestral sequences and func- 
tions for proteins that do not fulfil all or any 
of these conditions. However, haemoglobin 
is well suited for this type of study for several 
reasons. For instance, a wealth of compara- 
tive data on globin function across vertebrates 
has accumulated over the past 100 years’. 
We have intimate knowledge of haemo- 
globin structure-function relationships?*’. 


In addition, there is an ever-expanding pool 
of globin sequence information, thanks to 
genome-sequencing projects in diverse organ- 
isms (these efforts will also benefit similar 
studies on other proteins). 

Pillai and colleagues’ study is sure to raise 
several follow-up questions and to spark fur- 
ther research. For instance, the authors used 
the artificial effector IHP in their experiments, 
but the binding sites for physiologically rele- 
vant effectors of haemoglobin oxygen affin- 
ity, such as hydrogen ions, only partly overlap 
with — and in some cases are quite different 
from — the IHP binding site**”. Some evidence’ 
suggests that mechanisms by which hydro- 
genions modify haemoglobin oxygen affinity 
have evolved independently multiple times 
in vertebrates. This would make the picture 
much more complicated than can be assessed 
using IHP. 

It will be interesting to probe the evolution- 
ary origins of the regulation of haemoglobin 
oxygen binding by other effectors, such 
as carbon dioxide or physiologically rele- 
vant organic phosphates including ATP and 
2,3-bisphosphoglycerate. In doing so, we 
could examine, for instance, how haemoglobin 
regulation changed as demands on the body’s 
oxygen-transport system rose during the evo- 
lution of warm-blooded, active vertebrates, or 
how it was affected by changes over geologi- 
cal time in atmospheric oxygen levels, which 
have been proposed by some to have shaped 
vertebrate evolution”®. 

The assembly of multimeric proteins 
depends on specific concentrations and thus 
expression levels of the protein’s subunits. 
Natural selection presumably prevents imbal- 
anced subunit production, both to limit the 
costly energy expenditure involved in protein 
synthesis and to prevent accumulation of 
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potentially harmful spare globin subunits, as 
occurs insome hereditary human blood disor- 
ders". But, at some point, an increase and bal- 
ancing of expression levels of haemoglobin’s 
subunits would have been needed to enable 
tetramer formation. When did this occur? It 
has been shown” that the net surface charge 
of myoglobin acts as a molecular ‘signature’ 
that can be used to assess the expression levels 
of ancestral myoglobin. However, such mark- 
ers are largely unknown from other globin 
proteins, and we do not know the ancestral 
expression levels of any of the reconstructed 
haemoglobin precursors in Pillai and col- 
leagues’ study. 

Finally, as previously noted”, one of the 
most fascinating frontiers in this research 
field might be uncovering the evolutionary 
history of gene regulation. This remains an 
open question in the evolution of the genes 
that encode haemoglobin’s subunits. 
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More than three-quarters of the baryonic content of the Universe resides ina highly 
diffuse state that is difficult to detect, with only asmall fraction directly observed in 
galaxies and galaxy clusters’. Censuses of the nearby Universe have used absorption 
line spectroscopy“ to observe the ‘invisible’ baryons, but these measurements rely on 
large and uncertain corrections and are insensitive to most of the Universe’s volume 
and probably most of its mass. In particular, quasar spectroscopy is sensitive either to 
the very small amounts of hydrogen that exist in the atomic state, or to highly ionized 


and enriched gas* © in denser regions near galaxies’. Other techniques to observe 
these invisible baryons also have limitations; Sunyaev-Zel‘dovich analyses®” can 
provide evidence from gas within filamentary structures, and studies of X-ray 
emission are most sensitive to gas near galaxy clusters”°. Here we report a 
measurement of the baryon content of the Universe using the dispersion of asample 
of localized fast radio bursts; this technique determines the electron column density 
along each line of sight and accounts for every ionized baryon" °. We augment the 
sample of reported arcsecond-localized"* * fast radio bursts with four new 
localizations in host galaxies that have measured redshifts of 0.291, 0.118, 0.378 and 
0.522. This completes a sample sufficiently large to account for dispersion variations 
along the lines of sight and in the host-galaxy environments", and we derive a cosmic 
baryon density of Q, = 0.051°0;03:/74 (95 per cent confidence; h,) = H,/(70 kms *Mpc”) 
and H, is Hubble’s constant). This independent measurementis consistent with values 
derived from the cosmic microwave background and from Big Bang nucleosynthesis”””°. 


The Commensal Real-time ASKAP Fast Transients (CRAFT) survey 
on the Australian Square Kilometre Array Pathfinder (ASKAP) has 
commissioned a mode capable of localizing fast radio bursts (FRBs) 
with subarcsecond accuracy, thus enabling identification of their 
host galaxies and measurement of their redshifts zs ASKAP con- 
sists of 36 antennas equipped with phased array feeds, able to view 
30 degrees’ on the sky. Bursts are detected by incoherently summing 
the total power signal of individual beams from each of the antennas. 
Bursts detected in the incoherent pipeline are subsequently local- 
ized interferometrically by triggering a download of voltage data 
from a 3.1-s-duration ring buffer that is correlated and imaged at 
high time resolution to provide the localizations»”*. The 6-km base- 
lines of the array yield statistical position errors of approximately 
10”(S/N) 1, where the final coherent signal-to-noise ratio of the burst, 
S/N, exceeds 50 for any burst whose signal-to-noise ratio in the inco- 
herent pipeline is greater than 9. The resulting statistical (thermal) 
uncertainties are smaller than 0.2”. Systematic errors in these posi- 
tions are typically smaller than 0.5”. Atz=0.5, 1” corresponds to5kpc 


which is approximately the precision needed to associate an FRB with 
its host galaxy while reducing the chance coincidence probability to 
<1% (ref. *). 

We report the detection of four localized ASKAP bursts. Table 1 lists 
the burst properties, sky positions and host galaxy offsets, while Fig. 1 
shows the host galaxy identifications (see also ref.” and Methods). Their 
dispersion measures (DMs) are well in excess of the 30-100 pc cm? 
contributions expected from the disk and halo of the Milky Way at 
high Galactic latitudes*”, with the large excesses attributable to the 
intergalactic medium (IGM) and gas within each burst host galaxy. Two 
other ASKAP-detected bursts and their host galaxies were reported 
previously’*”’ in addition to three other host-galaxy identifications*"”"*. 

The precise localization of a set of FRBs to their host galaxies pro- 
vides the first ensemble of DM,;p, and Z;g, Measurements. The DM erp 
measurement represents the electron density weighted by (1+ z)7 
integrating over all physical distance increments ds to a given FRB: 
DMerp =Jne ds/(1+z). Physically, we expect DM,p, to separate into four 
primary components: 
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Seattle, WA, USA. Commonwealth Science and Industrial Research Organisation, Australia Telescope National Facility, Epping, New South Wales, Australia. “Centre for Astrophysics and 
Supercomputing, Swinburne University of Technology, Hawthorn, Victoria, Australia. Department of Physics and Astronomy, Macquarie University, North Ryde, New South Wales, Australia. 
®Instituto de Fisica, Pontificia Universidad Catolica de Valparaiso, Valparaiso, Chile. “e-mail: J.Macquart@curtin.edu.au; xavier@ucolick.org 
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Table 1| Properties of FRBs interferometrically localized with ASKAP 


FRB Time DM Rotationmeasure Fluence Right ascension Declination Host Offset from 
(utc)? (pecm’)  (radm”) (Jyms)’ — (h:min:s)° (degree:arcmin:arcsec)° redshift host nucleus 
(kpc) 

80924 16:23:12.6265 361.42(6) —_14(1) 16(1) 21:44:25.255 + 0.006 + 0.008 -40:54:00.10 + 0.07 + 0.09 0.3214 3.5 +0.9 
81112 —-17:31:15.48365  589.27(3) 10.9(9) 26(3) 21:49:23.63 + 0.05 + 0.24 -52:58:15.4 + 0.3+1.4 0.4755 3.12," 

90102 05:38:43.49184 363.6(3) 110 14(1) 21:29:39.76 + 0.06 + 0.16 -79:28:32.540.240.5 0.291 1:52 

90608 22:48:12.88391 338.7(5) - 26(4) 22:16:04.75 + 0.02 + 0.02 -07:53:53.6 + 0.3 + 0.3 0.1178 7.04£1.3 
90611 5:45:43.29937  — 321.4(2) - 10(2) 21:22:58.91 + 0.114 0.23 -79:23:51.3 + 0.3 + 0.6 0.378 17.2449 
90711 01:53:41.09338 593.1(4) - 34(3) 21:57:40.68 + 0.051 + 0.15 -80:21:28.8 + 0.08 + 0.3 0.522 15° 

The FRB detection pipeline makes use of the DBSCAN algorithm”®, as implemented by ref. ”°, to mitigate RFI and reduce the frequency of false-positive FRB triggers. 


*Burst arrival time referenced to a frequency of 1,152 MHz. 
’Quoted errors on the last significant digit of the fluence represent a 90% confidence limit. 


‘Errors listed after the burst position represent the statistical and systematic uncertainties respectively, and are combined in quadrature for a final absolute positional uncertainty. 


DMepp(2) a DMww,ism or DMmw, halo + DM cosmic(Z) + DMhost(Z) (1) 


with DMywasm the contribution from our Galactic ISM, DMywpato the 
contribution from our Galactic halo”, DM,,,., the contribution from 
the host galaxy including its halo and any gas local to the event, and 
DM osmicthe contribution from all other extragalactic gas. Only DM gosmic» 
determined by its path length through the IGM and the increase in 
baryon density with look-back time, is expected to have a strong red- 
shift dependence, although DM,,,., is weighted by (1 + Z;p,) /and may 
correlate with age, for example, if host galaxies have systematically 
lower mass at earlier times. 

Adopting our cosmological paradigm of a flat universe with matter 
and dark energy, the average value of DM¢osmic to redshift Z;pp is: 


ZFRB 


{DM cosmic? a f eal 


2 
9 Ao +2)? fOn(1 +z)? +O w 


with mean density 7, =f,p,(z)m,'(1 — %,-/2), where m, is the proton 
mass, Y,,.= 0.25 is the mass fraction of helium, assumed doubly ionized 
inthis gas, f,(z) is the fraction of cosmic baryons in diffuse ionized gas 
(this accounts for dense baryonic phases, for example, stars and neu- 
tral gas; see Methods), p,(z) = Q,9.9(1+z)°, and O,, and Q, are the mat- 
ter and dark energy densities today in units ofp. 4 = 3H3/81G where 
we parameterize Hubble’s constant H, in terms of the dimensionless 
hjy=H,/(70 kms‘ Mpc"). 

The DMywasm term (equation (1)) arises primarily from the so-called 
warm ionized medium of the Galaxy and is estimated from a model of 
this ISM component”. At the high Galactic latitudes (|b| > 33°) of the 
ASKAP sample, the value is DMywism = 30 pe cm. The DMywpraio term 
is not well constrained”, but is expected to be in the range of approxi- 
mately 50-100 pc cm ®. Hereafter we assume DMyw aio = 50 pe cm? 
and emphasize that the sum of its scatter and uncertainty are less than 
those of DM..osmic aNd DMy5, Which we discuss below. 

Figure 2 shows the theoretical curve for (DM osmic) VEFSUS Zepp for 
the Planck15 cosmology” and a model estimate of the scatter (90% 
interval) due to statistical variations in foreground cosmic structure 
(see Methods). Overplotted on the model are the estimated DM cosmic 
and measured Z;p, Values for all arcsecond-localized FRBs. We have 
estimated DM.o.mic by subtracting the following from the measured 
DM rrp Value: DMywasm from the Galactic ISM model; our assumed 
DMywhato Contribution; and an ansatz of DMy,5 = 50/(1+z) pe cm * esti- 
mated from theoretical work and informed from the analysis below. 
We ignore FRB 121102 and FRB 190523 in most of the analysis that fol- 
lows because of selection bias in their discovery, FRB 180916 owing 
to its low Galactic latitude (see Methods), and FRB 190611 because 
of its tentative association with a host galaxy (see Methods). The five 
ASKAP FRBs that remain comprise what we term the gold-standard 
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sample. The agreement between model and data is striking. Effec- 
tively, the FRB measurements confirm the presence of baryons with 
the density estimated from the cosmic microwave background (CMB) 
and Big Bang nucleosynthesis (BBN), and these five measurements 
are consistent with all the missing baryons being present in the 
ionized IGM. 

This result motivated us to quantitatively test for the consistency 
of Q,h7) with CMB and BBN measurements, simultaneously determin- 
ing the uncertain host-galaxy contributions to DMypp as well as the 
sightline-to-sightline variance in dispersion owing to the IGM. We do 
this by analysing the joint likelihood of our sample of five (seven, includ- 
ing FRBs 190523 and 190611) DMgap, Zerg Measurements against a four 
parameter model: one parameter for the large-scale structure scatter 
in DM gosmic tWO parameters for DM,,,s (a mean and a scatter), and Q, hx. 
Our model for the contributions to DM,p, Starts with equations (1) and 
(2), and we develop parametric models for DM, and the intrinsic 
scatter in DMeosmice We again fix DMyw halo =50 pe cm? and adopt the 
Galactic ISM model” for DMyw,sm- Uncertainty in these values can be 
absorbed into our model for DM,,,.- 

For DM cosmic OUr Model accounts for scatter in the electron column 
from foreground structures, whichis largely caused by random varia- 
tioninthe number of haloes a given sightline intersects. Cosmological 
simulations show that this variation is sensitive to the extent to which 
galactic feedback redistributes baryons around galactic haloes!" 
and that the fractional standard deviation of the cosmic DM equals 
approximately" Fz“? for z<1, where the parameter F quantifies the 
strength of the baryon feedback (0.1 being strong feedback and 0.4 
being weak). Stronger feedback corresponds to situations in which 
feedback processes expel baryons to larger radii from their host galax- 
ies or where more massive haloes are evacuated by such feedback. The 
formalism incorporates the effect of large-scale structure associated 
with voids and the intersection of sightlines with clusters. We find that 
a one-parameter model based on a physically motivated shape for 
the probability distribution of DM.,.mic given F provides a successful 
description of a wide range of cosmological simulations (see Methods). 
Our form for the distribution is strongly asymmetric towards lower 
redshifts, admitting large DM.osmic Values that we find are important 
in the estimation of Q,h,. 

We chose our model for DM,,,, to follow a log-normal distribution, 
characterized by a median exp(y) and logarithmic width parameter 
Onost SUCH that the standard deviation of the distribution is 
exp (1) e7%»s/2(e7Fos — 1)”, We do not attempt to incorporate redshift- 
dependent evolution in the host-galaxy dispersion contribution, but 
we doscale the distribution of DM,,,. by the factor (1+ Z,os,) -applicable 
toaparcel of plasma at redshift z,,,.so that DM,,,; is interpreted as the 
dispersion in the rest frame of the host galaxy. Our choice of a 
log-normal distribution is conservative in that it allows for a tail 
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Fig. 1|Locations of FRBs relative to their host galaxies. a-f, Optical images 
of the host galaxies of six FRBs localized by ASKAP, including the four new 
bursts reported here. a, FRB 180924; b, FRB 181112; c, FRB 190102; d, 

FRB 190608; e, FRB 190611; and f, FRB 190711. HG, host galaxy. White ellipses 
denote the 90% confidence region of each burst position, including statistical 


extending to large positive values, which may not be present in our 
sample given our selection criteria (See Methods) and the burst loca- 
tions relative to the host stellar surface density. We explore DMyoct 
distributions with median values in the range exp(1) = 20-200 pccm > 
and 0,,; in the range 0.2-2.0. 

Our final analysis compares the relative likelihood of models ina 
four-parameter space (O,/4, F, Onost, HL); See Methods for a Bayesian 
approach that yields similar constraints. Marginalizing the other 


uncertainty and phase referencing errors, while the red crosses mark the 
measured centroids of each host galaxy. The identification of the host galaxy of 
FRB190611is tentative. a-d, Deep VLT g-band images; e, f, deep GMOS i-band 
images. 


parameters over ranges restricted by other physical constraints, we 
derive the constraints on Q,h,, shown in Fig. 3 using our five-FRB 
gold-standard sample. The results are fully consistent with the joint 
CMB + BBNestimations and with only five (seven) burst redshifts the 
experiment yields a precision of 0(Q,h79)/Q,h7 = 0.31 (0.28) at the 
68% confidence level, with F marginalized over the range [0.09, 0.32] 
(see Methods). This quantitative result for Q,4,, substantiates our 
inference from the DM~Z relation in Fig. 2 that the FRB ensemble has 
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resolved the missing baryons problem. The ratio of our estimated 


Q, to that from CMB and BBN measurements is 1.1°02 A79. Formally, 


we exclude Qh, < 0.02 (0.01) at the 98.6% (99.8%) confidence level. 
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Fig.2|The DM-redshift relation for localized FRBs. Data points are 
estimations of the cosmic dispersion measure (DM omic) VerSuS FRB redshift 
(Zepp) for all current arcsecond- and subarcsecond-localized FRBs. The DM cosmic 
values are derived by correcting the observed dispersion measure DM,rp for 
the estimated contributions from our Galaxy and the FRB host galaxy (the 
latter assumed here to be 50(1+z) !pccm *; see text for details). Coloured 
points represent the gold-standard sample on which our primary analysis is 
based. The solid line denotes the expected relation between DM ¢ogmic and 
redshift for a universe based on the Plank15 cosmology (that is, Q,=0.0486 and 
H,=67.74kms*Mpc’”). The shaded region encompasses 90% of the DMeoemic 
values froma model for ejective feedback in Galactic haloes that is motivated 
by some simulations (with F= 0.2 in equation (4) in Methods), illustrating that 
the observed scatter is largely consistent with the scatter from the IGM. 


This constraint should improve considerably in the near term 
as ASKAP and other facilities acquire a larger sample of bursts 
with redshifts. 
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Fig. 3| The density of cosmic baryons derived from the FRB sample. The 
constraints onthe IGM parameters Q,h,, and F, and the host galaxy parameters 
exp(y) and 0,,s., fora log-normal DM distribution are derived using the five 
gold-standard bursts (as described in the text and Methods). a, Corner plots 
displaying the probability of a given value of F, exp(2) or 04,95, relative to its most 
likely value, and marginalized over the other parameters: heavy dashed lines 
represent the most likely values in each case. The green, dotted lines inthe 
corner plots of F, e“ and 0,,,;, denote the relative likelihood of these parameters 
when Q,h,) is constrained to the value set by the CMB + BBN measurements. 
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The contours displayed are in increments of 10% of the peak value. b, Magnified 
view of the corner plot for Q,4,) where the orange shaded region denotes the 
range to which 0,h,) is confined by CMB + BBN measurements. The dotted and 
dot-dashed lines represent the 68% and 95% confidence intervals of each 
parameter, respectively. The distribution of Q, is alternatively marginalized 
over the range of Findicated by cosmological simulations, [0.09, 0.32] (blue 
curve; see Methods), and over the entire range of F[0,0.5] investigated here 
(red curve). 


Additionally, analysis of our gold-standard sample mildly favours a 
median host galaxy contribution of about 100 pc cm“ witha factor of 
two dispersion around this value (0,,< * 1). This quantifies our result 
that the host contributions are sufficiently small to not compromise 
the use of FRBs for cosmology and IGM science. Even with our current 
small sample, we are beginning to constrain viable models for the redis- 
tribution of the cosmic baryons by galactic feedback. If we adopt a 
prior on O,h,) from the CMB, BBN and supernovae surveys”, we find 
F=0.04*076 (68% confidence), and if we further include FRB 190523 
and FRB 190611 we find F= 0.237027 (see Methods). A factor of two 
smaller error would start to differentiate between viable feedback 
scenarios (as discussed further in Methods), suggesting that FRBs have 
not only revealed that all the baryons are present but—with modestly 
larger samples—could constrain where they lie. 
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Methods 


Sample selection 

Analogously to cosmological studies of the distance ladder with super- 

novae, we wish to establish a strict set of criteria for the FRB sample 

to minimize biases while maximizing statistical power. On the latter 
point, we wish to construct the largest sample while avoiding events 
whose DM is dominated by non-cosmological effects (that is, host or 

Galactic gas). Regarding bias, the greatest concern is the association 

of a host galaxy with a given FRB on the basis of the DM-Z relation, 

that is, adopting this relation as a prior to establish the host identity. 

This is a valuable practice when one aims to resolve the underlying 

host-galaxy population’””’ but would bias any cosmological study. 

Last, one mustalso be cognizant of biases related to triggering on FRB 

events. This practice is acomplex function of the FRB fluence, its DM, 

and the pulse properties”®. 

With these issues in mind, we propose the following set of criteria 
(1-4) to generate a ‘gold-standard sample’ of FRBs for cosmological 
study: 

1. To make a confident host-galaxy association, we require the prob- 
ability of mis-identifications to be <1% without invoking the DM-z 
relation since to do so would introduce a bias. For this we require 
the 95% localization area to encompass one and only one galaxy 
unless multiple galaxies have a common redshift. By ‘encompass’ 
we include light from any part of the galaxy. In practice, this will 
require a localization to <1” for z>1, but becoming less stringent 
for less distant hosts. We propose an initial set of specific criteria as 
follows. (i) Define 95% areas for the localization and for each galaxy 
inthe region. Call these Z and G,, G,, and so on. (ii) Demand one and 
only one galaxy overlap L. The only exception is if z, = z,. (iii) For the 
overlapping andG, require that >50% of the smaller area lies within 
the larger. (iv) Do this for galaxies as faint as R = 25 (anything fainter 
is generally too difficult for a spectroscopic redshift anyhow). 

2. The finite temporal and spectral resolution of the FRB survey caus- 
es a decline in sensitivity with increasing DM to the point that tele- 
scope resolution causes an effective threshold at DMeutor at which 
point a burst would no longer have been detectable. A conservative 
approach would omit any burst with DM uror Sufficiently low that it 
excludes a large (230%) fraction of the total probability of p(DM, a2), 
on the grounds that it presents a biased probe of p(DM,,,q)|Z). Al- 
though application of this criterion presupposes a DM-z relation 
and its probability density function (PDF), it does so only weakly. 
This point is addressed in detail below in the subsection on biases in 
the probability distribution. 

Anevent detected near to the sensitivity threshold is biased inthe 
sense that the instrumental decline in sensitivity with increasing 
DM dictates that any burst detected near this threshold would not 
have been detectable at higher DMs. Thus, for a given redshift we are 
biased to finding events with DMs that are under-representative of 
the entire DM distribution at that redshift. Thus, only more luminous 
bursts, whose detection S/Nis sufficiently high that DM... exceeds 
the plausible range of DM,,,, at that redshift are devoid of this bias. 

3. FRB events with extreme properties (for example, high RM, large 
temporal broadening) will be excluded to minimize the impact of 
host galaxy and Galactic gas. 

4. Acutoffis imposed on FRBs whose expected contribution from the 
disk component of the Milky Way ISM is large, to avoid large uncer- 
tainties in the subtraction of the Galactic ISM DM contribution. 
Models of the Galactic plasma distribution” typically produce errors 
in known pulsar distances of the order of several tens of per cent 
(and much higher in some cases)”””°. To avoid DM errors in excess 
of about 100 pc cm* we restrict our sample to those bursts whose 
predicted DM,sy values are less than 100 pc cm™*. A conservative 
application of this criterion restricts FRB detections to sight lines at 
high Galactic latitude, |b| 2 20°. 


We acknowledge that all criteria 1-4 are subject to refinement as we 
learn more about FRB progenitors and their host galaxies. 

Regarding criterion 3, a dominant contributor to the DM variance 
is the circumburst environment and the ISM of the host galaxy. 
Although it is not possible to make a precise estimate of this compo- 
nent, the burst RM, the amount of Faraday rotation exhibited by lin- 
early polarized emission caused by its propagation through a 
magnetized plasma, presents a means of identifying those bursts 
whose radiation has probably propagated through a substantial 
(>100 pc cm) amount of matter in the host galaxy. For each burst the 
Milky Way contribution to RM for |b| > 10° is small (<250 rad m”) and 
measurable” and the IGM contribution is estimated” to be about 
1rad m”. Galactic haloes, similarly, have been inferred to make con- 
tributions of several tens of rad m’ to the RM” from radio-loud quasar 
observations, but our first analysis with an FRB” yields RM <10 rad m™*. 
A suitable cutoff due to host-galaxy ISM contamination is suggested 
by assuming the host-galaxy magnetic field strength is comparable 
to that of our Galaxy. Measurements of Faraday rotation and disper- 
sion from pulsars in the Milky Way (see figure 3 in ref. **) exhibit a mean 
trend DM =1.55|RM|°> = f,(RM), where RM and DM are measured in 
their usual units of rad m~ and pc cm®™ respectively. We find that the 
root-mean-square (r.m.s.) deviation of the actual DM values from their 
values predicted on the basis of this trend using |RM| are 69% of the 
DM (that is, the r.m.s. errors are 69% of the mean DM value: 
<[DM — foy(RM)}?/DM?)? = 0.69). We further find that there is an 85% 
probability that the actual DM deviates from its predicted value by less 
than 0.9 times the actual DM value, and a 96% probability that the pre- 
dicted value differs by less than 2.0 times the DM value. We therefore 
suggest that a cutoff criterion |RM — RMywlopservea < 100(1 + z)* rad m7? 
bounds the dispersion measure toDM < 250(1+Z) pc cm? with 85% 
confidence. 

Asimilar trend observed between the DM and the temporal smearing 
of Galactic pulsars caused by scattering®”* can also be used to place 
upper bounds on the host contribution. Recent updates to this rela- 
tion” indicate that, on average, a pulse smearing time, T, less than 33 ms 
(2 ms) limits DM to <200 pc cm? (300 pe cm”) at 0.327 GHz (1 GHz). 
However the DM-trelation exhibits -0.8 dex variation about the trend 
(as discussed in the context of FRBs in the supplementary material in 
ref.’’), thus requiring 7<5 ms to ensure a reasonable (~70%) confidence 
that the DM contribution is less than 200 pc cm *. We caution that the 
use of Tas an indicator of the host-galaxy DM contribution is subject to 
considerable uncertainty, since neither the distances to the scattering 
material from the bursts, nor even the nature of the turbulence respon- 
sible for the temporal smearing observed in FRBs, is well established. 
The estimates presented here would be invalid, for instance, if the scat- 
tering were associated with the direct burst environment rather than 
the ISM of the host galaxy. 

Adopting all the above criteria to the current set of FRBs with red- 
shift estimates based on their association with galaxies (Table 1), we 
eliminate the following sources from cosmological analysis. 


FRB 121102. We exclude the repeating FRB 121102" from our analysis 
for two reasons: (a) the rotation measure of this burst is anomalously 
high”, being three orders of magnitude higher than other FRBs in this 
sample and indicating that this burst DM is likely contaminated by an 
abnormally high circumburst or host galaxy contribution, and (b) its 
location within 2° of the Galactic plane imparts a larger and probably 
less well constrained DM contribution from the Milky Way relative to 
the high Galactic latitude bursts detected by ASKAP. 


FRB 190523. We have conducted the analysis both with and without 
FRB 190523. The host galaxy identification” from the larger, 3” x 8” 
localization region, is more uncertain than the ASKAP FRB detections 
and was partially based on an assumed DM-redshift relation which 
presents a potential source of bias in our analysis. 


FRB 171020. The identification of the host galaxy associated with 
FRB 171020 is predicated ona search volume confined to a specific 
distance based onan assumed DM-redshift relation”, andis therefore 
excluded from the sample. Moreover, it is difficult to ascribe a numeri- 
cal value to the likelihood of acorrect association in this instance. 


FRB 190611. Our follow-up observations for FRB190611 identify a gal- 
axy at J212258.0-792350 with redshift z = 0.378 offset by ~2” from the 
current estimate of the FRB localization. The large offset (-10 kpc at 
that redshift) and large systematic uncertainty in the FRB localization 
and the presence of a closer, faint source revealed by deep GMOS i-band 
imaging preclude a secure association at present. As with FRB 190523, 
we conduct our analysis both with and without this burst in our sample. 


FRB imaging and astrometry 

The procedure for characterizing the position and positional uncer- 
tainty of FRBs 190102, 190608, 190611 and 190711 followed that 
described inthe supplementary material of refs. *"°. For the purposes 
of extracting these observables, we use only the total intensity data. 

For each FRB, raw voltage data for a suitable calibrator source was 
captured via the CRAFT pipeline in the hours following the burst detec- 
tion. For FRB 190102 and FRB 190608, the source PKS 1934-638 was 
used, while for FRB 190611 and FRB 190711, it was PKS 0407-658. From 
these calibrator data and the FRB data, visibility data sets were pro- 
duced using the DiFX correlator’. An initial coarse search for the FRB 
position used the DM, pulse duration, and approximate position from 
the incoherently summed FRB detection data, and after detection inthe 
interferometric data a re-correlation was performed with revised posi- 
tion, DM, and pulse time/duration. Radio frequency interference (RFI) 
was mitigated for the FRB data set by subtracting visibilities from an 
adjacent time range surrounding the burst itself. Additionally, for each 
FRB a visibility data set and image was generated using the entire 3.1s 
of raw voltage data, to identify background radio continuum sources 
whose positions could be compared to catalogue values and verify the 
astrometric accuracy. 

Per-station frequency-dependent complex gain calibration was 
derived from the calibrator data set using the ParselTongue*®’-based 
pipeline described in ref. ° and transferred to the FRB datasets, before 
imaging in the Common Astronomy Software Applications (CASA) 
package. Best-fit positions and uncertainties were the extracted for 
each source using the task JMFIT in the Astronomical Image Process- 
ing System (AIPS)*°. 

Statistical uncertainties on the FRB positions were less than 0.5” in 
all cases. However, as discussed in refs. *"°, the phase referenced FRB 
images will be subject to a systematic positional shift resulting from 
the spatial and temporal extrapolation of calibration solutions. The 
magnitude of this systematic shift can be estimated by comparing the 
positions of continuum sources in the field surrounding the FRBs to 
their catalogue values. The accuracy to which this can be performed 
depends on the number of continuum sources visible in the ASKAP 
continuum image and their brightness, as well as the degree to which 
their intrinsic source structure can be modelled (or neglected). For any 
given continuum source, the presence of unmodelled structure will act 
to shift the position of the source centroid and results ina measured 
offset between the ASKAP and reference positions, which perturbs the 
actual systematic positional shift. However, the direction of sucha shift 
depends on the source structure, and hence should not be correlated 
between different continuum sources. For FRB 190102 and FRB 190611, 
observations made with the Australia Telescope Compact Array at a 
comparable frequency and angular resolution to the ASKAP image 
minimize the impact of source structure, but for FRB 190608, we made 
use of the Faint Images of the Radio Sky at Twenty centimetres (FIRST) 
survey”, which has angular resolution roughly twice that of the ASKAP 
images, and for FRB 190711 we used archival 5 GHz ATCA data. 


Assuming the phase referencing errors result inasimple translation 
of the FRB field image, we estimate the magnitude of this offset and its 
uncertainty with a weighted mean of the measured offsets for each of 
the continuum sources in the FRB field, after discarding any sources 
that were resolved in either the ASKAP image or the reference image. 
The magnitude of the offset ranged between 0 and 1.7 arcsec, with 
uncertainties ranging from 0.3 to 0.6 arcsec. 


Host identification and spectroscopy 

The optical spectroscopy and redshift determinations for FRB 180924 
and FRB 181112 have been outlined previously’. Spectroscopy of 
the host galaxies of FRB 190102 and FRB 190611 was conducted using 
the FOcal Reducer and low dispersion Spectrograph 2” (FORS2) onthe 
European Southern Observatory’s Very Large Telescope (VLT) on Cerro 
Paranal. FORS2 was configured with the GRIS_300I grism, an OG590 
blocking filter, and a1.3” wide slit, yielding a resolution Rwy ~550. For 
FRB 190102 2 x 600 s exposures were obtained on 2019 March 25 UT, 
while for FRB 1906112 x 1,350 s exposures were taken on 2019 July 12 UT. 
These and associated calibration images were processed with the Pypelt 
software package* to derive flux and wavelength calibrated spectra. 

For the host galaxy of FRB 190608, the optical spectrum from the 
seventh data release (DR7) of the Sloan Digital Sky Survey** (SDSS) was 
retrieved from the IGMSPEC database*. 

Imaging of the host galaxies of FRB180924, FRB 181112 and FRB190102 
was undertaken using FORS2 on the VLT, while the FRB 190608 and 
FRB 190711 hosts were imaged with VLT/X-shooter*. Imaging of the 
host galaxies of FRB 190611 and FRB 190711 was undertaken using GMOS 
on Gemini-South”, from sets of 44 and 12 images of 100 s each in the 
i-band, respectively. 

The FORS2 images were first reduced with ESO Reflex‘, further 
processed in Python, and then co-added using a median combine in 
Montage*’. The WCS solutions were updated with Astrometry.net®, 
with further adjustments performed by comparison with Gaia® or 
Dark Energy Survey” positions. The X-shooter images were reduced 
using a custom Python pipeline making use of the package CCDPROC™, 
including measures to cope with prominent fringe patterns in I-band; 
the images were then co-added and the astrometry adjusted with the 
same method as above. The GMOS images were reduced and co-added 
with PYRAF using standard procedures; the astrometry was adjusted 
withthe same method stated above. Projected distances were estimated 
using the Javascript Cosmology Calculator. 

Two of the FRBs in the gold-standard sample, FRB 190608 and 
FRB 190711, have offsets larger than 1 arcsec from the galaxy light cen- 
troid. FRB 190608, however, is az= 0.11 galaxy (that is, nearby) and 
the chance projection is even less than 0.3%. Regarding FRB 190711 
we estimate a 6.1 x 10° probability that an unrelated galaxy is within a 
region out toa distance between the galaxy centroid and the outermost 
edge of the FRB error circle (for the measured R(AB) = 23.7 + 0.2 mag 
as calibrated against the SkyMapper survey), and we estimate a prob- 
ability p = 1.9 x 10° for an unrelated galaxy to be within the FRB error 
circle but below the detection limit of r= 25.5 mag. The remainder of 
the host-galaxy associations for each FRB have a probability P< 10° of 
achance occurrence”. 

The radio burst dynamic spectra and host-galaxy optical spectra are 
shown in Extended Data Fig. 1. 


Estimating (DM .osmic) 

Central to the analysis is an estimate of the average DM.oamic Value aS 
a function of redshift and for a given cosmology, as defined in equa- 
tion (2). Previous formulations**** have adopted similar definitions 
but with less precise considerations for f,(z), the fraction of cosmic 
baryons in diffuse ionized gas. Our formulation considers the redshift 
evolution of three dense baryonic components that will not contribute 
to n,: (1) stars; (2) stellar remnants (for example, white dwarfs, neu- 
tron stars); and (3) the neutral ISM of galaxies. For (1), we interpolate 
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the empirically estimated stellar mass density estimates”. For (2), we 
adopt the estimation of ref. *° which is 30% of the stellar mass. For (3), 
we assume the mass ratio of the ISM to stars is constant from z= 0-1 
and adopt the present-day estimate** of M,.\,/M.= 0.38. The model also 
allows for the partial ionization of helium but this is not relevant for the 
FRBs considered here. All of these calculations are encoded in Python 
inthe public FRB repository (https://github.com/FRBs/FRB). Censuses 
of the gas and star evolution of baryons in z< 1 systems constrain the 
error inthe fraction of neutral and non-diffuse baryons (thatis, 1—f,(z)) 
to about 30% at present. Thus, with this component constituting ~15% 
of the total baryon budget at z= O, the correction to 0,h,, is uncertain 
at alevel below 6%, well below the level of precision that investigation 
of the current FRB sample permits. We refer the reader to ref. ° fora 
discussion of the constraints on/f,(z) possible in future with a larger 
sample of FRBs. 


Cosmological parameter estimation 

The ASKAP FRB measurements and localizations afford a new oppor- 
tunity to constrain our cosmological paradigm through estimations of 
DM cosmic 4Nd Z;pp- Fhe cosmic DM is governed primarily by the baryonic 
density Q, and the expansion rate of the Universe, H,, and the fraction of 
baryons inthe diffuse phase, f,(z). In the following, we will assume a flat 
cosmology with Q, = 0.691 (Planck15). The expansion rate is dominated 
by this dark energy term for z< 0.7, so cosmological analysis of the 
ASKAP FRBs is not sensitive to the precise value of Q,, and, therefore, 
to aclose approximation, (DM cosmic) * QuHo. We therefore proceed to 
place a constraint on this product. 

To construct a likelihood function £ from our FRB measurements, 
we build a model for DM... nic and its uncertainty. The model is based 
primarily onthe cosmological parameters, but it must also allow fora 
nuisance parameter which accounts for the DM of our Galactic halo 
and that of the host galaxy: DMywhato + DMnost- For the former term, 
theoretical models informed by observation suggest DMuwhato ~ 
50 pc cm? with a small dispersion®°°, but we acknowledge that the 
mean value is poorly constrained. We expect the variance in these terms 
to be driven by DM,,.s,, which follows from the large range in DM values 
observed for the ISM of our Galaxy, even if one ignores whether FRBs 
occur in ‘special’ locations within a galaxy. Furthermore, the very high 
RM and the (probably related) large DM excess of FRB 121102 above 
DM.osmic implies at least one FRB witha large DM,,., value”. 

The PDF for DM, has limited theoretical motivation. In the fol- 
lowing, we assume a log-normal distribution which has two salient 
features: (1) it is positive definite; (2) it exhibits an asymmetric tail 
to large values. The latter property allows for high DM,,,, values that 
might arise from gas local to the FRB, for example, an H II region or 
circumstellar medium. Formally, we adopt alog-normal distribution: 
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This distribution has a median value of e“ and variance 
e!* Fos! 2(@%hox — 1)/2 We consider distributions with e in the range 
20-200 pccm “and o,,,,in the range 0.2-2.0. An illustrative set of these 
probability distribution functions for DM,,,s is Shown in Extended Data 
Fig. 2. For consistency of interpretation of DM,,,, values from bursts 
at disparate redshifts, the probability distribution function is refer- 
enced to the rest frame of the host galaxy, so a correction 
DMhost > DMnost(1 + Zprs) is applied and the distribution normalized 
accordingly, however, in practice this redshift correction factor varies 
only over the range 0.7 to 0.9 inthe gold-standard sample. The inferred 
dispersion in DM,, 5 is consistent with the expected range of host DMs 
given the galaxy type, morphology and orientation on the sky and the 
distance of the FRB from the galaxy centre. However, we are unable to 
state more than this at present, and remark that present estimates of 
the DM,,,.. contributions towards specific localized FRBs*"® with two 


quite different host galaxies are, respectively, inthe range 30-81 pccm? 
and <70 pccm*~. This suggests that any correction on this basis could 
be small for our sample. A further interesting aspect of our measure- 
ments is that it is beginning to place limits on these corrections. 

Altogether, DMpf"! = DMcosmic(Z) + DMhost + DMuw,ismWith the last 
quantity estimated from NE2001 based on the FRB coordinates; given 
the high Galactic latitudes of the present ASKAP sample we adopta 
value DMyw,sm = 30 pe cm ° for these bursts. The mechanics of our 
treatment of the DMaost- DMaw-halo ANd DMywasm terms is described in 
greater detail in equation (6). 

The model probability distribution for DM.,smic is derived from theo- 
retical treatments of the IGM and galaxy haloes”’” with op, dominated 
by the physical variance in DM.osmice Extended Data Fig. 3 shows that 
comparison against the analytic form (as used in other IGM-related 
contexts°°) 
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(A) = AA exp |- } A>0O (4) 


provides an excellent match to the DM,,mi- distributions observed in 
our semi-analytic models and in a hydrodynamic simulation, where 
A=DMcosmic/ (DM cosmic?» Fhe motivation for this form is that in the limit 
of small dp,,, the distribution of DM should approacha Gaussian owing 
to the Gaussianity of structure on large scales (a non-negligible com- 
ponent of the variance of P.osmic(4) comes from tens of megaparsec 
structures) and in the low-opy limit the halo gas is more diffuse and so 
the PDF approaches a Gaussian owing to the intersection of the line of 
sight with more structures. Conversely, when the variance is large, this 
PDF captures the large skew that results from a few large structures that 
contribute to the DM of many sightlines. The sharp low-DM cutoff in 
the distribution reflects the fact that a large component of the IGM is 
highly diffuse, and displays much less variance than the halo-related 
component, thus imposing a strict lower limit to the DM. The parameter 
Bisrelated to the inner density profile of gas in haloes. Ifthe 3D density 
profile scales asp «1%, B=(a+1)/(a-1) such that an isothermal profile 
with a= 2 has 6B =3 and an inner slope of a=1.5 has B=5. Such slopes 
are consistent with those found in numerical simulations of intrahalo 
gas’, The indices a=3 and B =3 provide the best match to our mod- 
els (although we find that Peosmic(A) is weakly sensitive to order unity 
changes in these parameters, with 6 =3 having the most flexibility for 
our z=0.11 measurement relative to B= 4). We use the parameter op, in 
Deosmic(4) as an effective standard deviation even though formally the 
standard deviation with 6 =3 diverges logarithmically. We find that 0p, 
is closely tied to the true standard deviation when imposing motivated 
maximum cutoffs for A on the distribution. The mean of the distribution 
requires that (4) =1, which fixes the remaining parameter Coin Deosmic(A). 

Extended Data Figure 3 shows models that use equation (4) for Deosmic 
relative to numerical calculations at redshifts that span the considered 
range. The solid curves are the previously described semi-analytic 
models”, which assume that haloes below the specified mass have been 
evacuated of gas, and the ‘swinds’ simulation of ref. °. The dashed 
curves show the function evaluated for the best-fit op,,, and the 
dot-dashed curves adopt the parameterization 0p, = Fz °° and scale 
off the z= 0.5 best fit value for 0p, yielding F of 0.09, 0.15 and 0.32 in 
our semi-analytic models in which haloes of 10“, 10% and 10” solar 
masses (M.,) are evacuated of their gas. The agreement of the dot-dashed 
curves with the solid numerical model curves demonstrates that 
Opm = Fz °° approximates the evolution over the range of our 
measurements. This scaling is further motivated in the Euclidean 
limit, applicable for z«1, where (DMoosmic)=NecZ/H and 
Opm= DMhaloWN/KDM cosmic Where n, is the mean electron density and 
Nis the number of haloes intersected, whichis proportional to the path 
length probed or cz/H. 

While our analytic parameterization describes the distribution of 
DMosmic both in semi-analytic models and numerical simulations, we use 


the more flexible semi-analytic models to set the marginalization range 
in F that is used for some constraints on Q,h79. Here we argue that the 
considered semi-analytic models shown in Extended Data Fig. 3 span the 
likely range of possible feedback scenarios. These models approximate 
haloes as retaining their gas in a manner that traces the dark matter 
above some mass threshold. This approximates the picture in many 
simulations” and analytic models in which the fraction of halo gas 
retained is a strongly increasing function of halo mass before saturat- 
ing at unity. Furthermore, gas that is outside haloes is less effective at 
contributing variance: take the example where gas is distributed out to 
adistance Rarounda halo. The probability a sightline intersects this gas 
scales as R’, leading to less shot noise for larger R, while the contribution 
of each individual system scales as R”, leading toa smaller contribution 
for larger R. This picture motivates the semi-analytic model’s approxima- 
tion that ejected gas diffusely traces large-scale structure”. 

Simulations and models generally find that haloes below threshold 
masses in the range of about (10"-10")M, are evacuated of gas°!+°5, 
although some implementations of stellar quasar feedback can result 
in different predictions. Halo gas in M>10"M, haloes is constrained 
by X-ray observations to mostly reside within such haloes®’. Our 
strongest feedback model, in which F= 0.09, pushes up against this 
observational limit. Our model with the weakest feedback assumes 
that dwarf-galaxy-sized haloes with 10M, retain their gas and yields 
F=0.32 (and we find that Fis just marginally larger if the 10’°M, haloes 
of the smallest dwarf galaxies retain their gas, haloes that would be just 
massive enough to overcome the pressure of the IGM and retain their 
gas®*). Thus our models span the range of likely feedback scenarios. 

Given this semi-analytic formalism, we proceed to estimate the model 
likelihood by computing the joint likelihoods of all FRBs: 


NeRBs 
L= I] P(DM' pplz) (5) 
i=1 
where P(DM’frpiZ;) is the probability of the total observed DMs cor- 
rected for the Galaxy: 


DM’rrp = DMerg~ DMyw,ism~ DMmw,halo=DMhost *DM cosmic (6) 


For a burst at a given z, and the model parameters we have: 
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aNd Ppost(DMhostl/> Shost) the PDF for DM,,,<. With the likelihood function 
defined we construct a grid of Q,Hp, F, wand o,,. values and marginal- 
ize over the last three to obtain the constraint on Q,/,. These results 
are presented in Fig. 3 in the main text for the gold-standard sample. 
Extended Data Fig. 4 presents the results of the same analysis when 
FRBs 190523 and 190611 are included in the data set. 

To place confidence intervals on Q,H, and py, we use the likelihood 
ratio test statistic D: 


D(OApvh7o, F, L, host) = 2108L max ~ 2logL(Qyh7o, Few, host) (8) 


where £,,,, is the maximum value of £, that is, for parameters maximiz- 
ing the likelihood. According to Wilks’ theorem, for asufficiently large 
number of FRBs, D will be distributed according toa x? distribution 
withn=4 degrees of freedom. If the cumulative distribution function 
of the x) distribution is CDF(x), solving CDF(x) = p constrains the 
Ovhro, F, Ly Gnost Parameter space to the region D < x at confidence level 
(C.L.) p. 

Uncertainties in these confidence estimates are probably dominated 
by systematic effects in the sample selection, and small number sta- 
tistics. To test both, we extend the gold-standard sample of five bursts 


to include FRBs 190523 and 190611. The resulting analysis is shownin 
Extended Data Fig. 4. Compared with Fig. 3, the inclusion of two further 
bursts shifts the maximum-likelihood estimate for Q,h7) at 68% C.L. 
from 0.051705! to 0.042°0-04 | that is, consistent with the original 
uncertainties. This does not mean that there is no systematic bias, nor 
that Wilks’ theorem holds precisely for our sample, but rather that 
any such effects are minor compared to the inherent uncertainties 
from our small sample size of localized bursts. 


Accounting for biases in P(DM,z) 

The cosmological evolution of the FRB population, and its intrinsic 
luminosity function, can strongly influence the observed/expected 
distribution of FRBs in redshift-DM space”, P(DM,z). We therefore 
perform our likelihood maximization over P(DM|z) only. This discards 
the information contained in the redshifts of our detected FRBs, but 
makes the procedure more robust against factors influencing the red- 
shift distribution. 

The remaining bias comes from changing sensitivity as a function of 
DM. This can be either direct, through DM-smearing within frequency 
channels, or indirect, through increased scatter broadening associated 
with the same gaseous structures causing the observed DM. 

We wish to compute the dispersion measure limit, DM uror, at Which 
a given FRB would have been undetectable. The S/Nof a detected burst 
depends on its intrinsic (or scatter-broadened) width, w, the time 
resolution of the detection system, ¢,.,,and the amount of dispersion 
measure time smearing between adjacent 1 MHz spectral channels, 
tsmear( DM). The resulting width of the pulse is: 


At,,.(DM) = w? + £2,,+ C2 neax(DM) (9) 


We compute DM uo Such that the burst, detected by our system witha 
signal-to-noise ratio of s, ata DM,,, would have fallen below our detec- 
tion threshold of s,= 9.00. For each burst we thus solve 


_ Atoys(DM ops) 
\ Alops(DMcutotr) 
Extended Data Table 1 lists the DM, widths, time resolution, detec- 


tion S/N values and derived DM,,,of Values for each of the bursts in 
our sample. 


(10) 


MCMC analysis 

To complement the likelihood analysis presented in the main text, we 
have performed Bayesian inference of amodel constructed to describe 
the DMand redshift measurements of the FRBs. The model consists of 
four parameters describing two PDFs for distinct components of the 
dispersion measure: (i) DM.osmicy Which describes the extragalactic 
dispersion measure including both the diffuse IGM and the gas associ- 
ated with intervening galactic haloes; and (ii) DM,,.¢, Which describes 
ionized gas associated with the host galaxy (we assumea fixed DMywhato 
value of 50 pccm ° for the Galactic halo). We parameterize the former 
PDF with equation (4), that is, Deosmic(A) With A = DM cosmic/{DM cosmic? and 
(DM cosmic? the average value for the assumed cosmology (equation (2)). 
The foregoing subsection on cosmology and host-galaxy parameter 
estimation describes theoretical treatments that motivate one to 
adopt a =3 and £ =3 in equation (4) and to adopt the functional form 
of 0py=F/z"” for its dispersion parameter. For (DM cosmic)» We Modulate 
its amplitude via the product 0,h,9. Therefore, Deosmic(A) is governed 
by two free parameters: F and Q,h4. 

We adopt the same Ppost(DMhostl/, Onost) PDF described earlier, with 
free parameters exp() and @,,,;,. From these two PDFs we construct a 
likelihood function for the set of observed FRBs using equations (5) 
and (7). Note that measurement uncertainty in DM;p, does not enter 
into the evaluation of £ because the dispersion from DM .osmic and 
DMhpost are much greater. Put another way, our model is constructed to 
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describe the observed distribution of DM ’;rpvalues with an anticipated 
dispersion substantially exceeding the uncertainty in individual DMgpp 
measurements (typically <1 pc cm”). 

Effectively, two of the parameters (Q,/179, 1) set the amplitude of the 
DM-zrelation and two describe its dispersion (F, 0,,.,). We anticipate a 
degeneracy between each set although if DM,,,. is approximately inde- 
pendent of redshift then this apparent degeneracy may be resolved. 
Only the dispersion in DM,,.mic, parameterized by F, allows for large 
negative excursions from the mean relation. Lastly, we introduce priors 
for the four parameters based on a combination of experimentation, 
physical expectation and scientific motivation. For Q,h,,, the scien- 
tific focus of this manuscript, we adopt a uniform prior ranging from 
0.015-0.095 which easily spans the Planck15 estimate. For F, we adopta 
uniform prior in the interval (0.01, 0.5), alarger range than anticipated 
by our models in the frequentist analysis presented above. Regard- 
ing exp(), we adopt a uniform prior in the interval [20, 200] pc cm™. 
We consider lower values for the mean to be non-physical and we will 
find that larger values are disfavoured by the observations. Lastly, we 
assume a uniform prior for 6,,, in the interval [0.2, 2]. The larger Ojos 
values give non-negligible probability for DM,,,. values in excess of 
1,000 pccm >. Future observations, especially an ensemble of FRBs at 
low redshift, will better inform these priors on j1and Opo<¢- 

Adopting the above likelihood and priors, we performed a Bayes- 
ian inference of the four parameters using the gold-standard sample 
of FRB measurements and standard MCMC techniques. These were 
performed with the PYMC3 software package using slice sampling 
and four independent chains of 40,000 samples after a tuning period 
of 2,000 samples. Extended Data Fig. 5 presents a corner plot of the 
combined samples. A principal result is that the data yield aQ,h,, distri- 
bution fully consistent with the independent estimates from the CMB, 
BBN and supernovae. Quantitatively, the 0,4;) samples have a median 
value of 0.056 and a 68% confidence interval spanning [0.046, 0.066] 
(see Extended Data Table 2). Taken strictly, at 95% confidence these 
FRB measurements require a universe with at least 70% of the baryons 
inferred from BBN and CMB analysis. These results hold despite the 
weak priors placed on the PDF for DM,,.,, but we warn that they are 
dependent on the value assumed for DMywphato- 

Extended Data Figure 5 also reveals the anticipated anti-correlations 
between and 0Q,h,, and (toa lesser extent) Fand o,,,s,. We expect these 
to weaken as the FRB sample grows in size and redshift range. Lastly, we 
note that the Fand yz parameters have maximal probability at one edge 
of their assumed prior intervals. Values of Fthat are on the higher side 
of the considered range (a range that spans the possible model space) 
are modestly favoured. For jz, we consider 20 pc cm ° to be the lowest 
sensible mean contribution from the host galaxy (which could also 
mean a lower value for the Galactic halo than adopted here). 

The frequentist analysis in the main text and this Bayesian MCMC 
analysis agree very well on the gold-standard sample. The most nota- 
ble differences are that the MCMC analysis prefers a distribution for 
exp(1) that is more peaked to smaller exp(j) values and one for F that 
peaks towards larger values, although with no value for exp(j) or F 
strongly preferred by either analysis. When the parameters are not 
well constrained one would not expect perfect agreement between the 
methods, as, for example, the Bayesian analysis is sensitive to our prior 
on exp(y) when this parameter is not well constrained. It is expected 
that the differences between the two methods will become smaller with 
more data. Already for the seven-burst sample (Extended Data Fig. 4), 
the distribution for Fin the frequentist analysis is more similar to the 
MCMC analysis of the gold-standard sample. 


Data availability 


The data sets generated during and/or analysed during this study 
are available at https://data-portal.hpc.swin.edu.au/dataset/ 
observations-of-four-localised-fast-radio-bursts-and-their-host-galaxies. 


Code availability 
Custom code is available at https://github.com/FRBs/FRB. 
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Extended Data Fig. 1| The pulse profiles and host galaxy spectra of the four 
newFRBs presented here. Top row, ASKAP data. The pulse profiles (upper 
subpanels, labelled A) and the radio dynamic spectra (lower subpanels, 
labelled B) show the detections by the ASKAP incoherent capture system (ICS) 
of FRB 190102 with atime resolution of 0.864 ms, and of FRBs 190608, 190611 


and 190711 witha resolution of 1.728 ms. The spectral resolution is 1 MHz across 
the 336-MHz bandwidth. Bottom row, the SDSS (HG 190608) and VLT/FORS2 
(HG190102, HG 190611 and HG 190711) optical spectra of the host galaxies 
located at the respective FRB positions (see Table 1), and the spectral lines from 
which their redshifts are deduced. 
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Extended Data Fig. 2 | The shape of the distribution used to model the host-galaxy dispersion measure, DM,,,,:- The behaviour of the probability distribution 
Prost(DMnhostl#, nose) is shown for an illustrative set of parameters spanning the range of plausible values for and Oj. 
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Extended Data Fig. 3 | The expected contribution of the cosmic baryons to mass that can resist feedback and retain its gas is given by Min. The dashed 

the dispersion measure. The probability distribution of DM¢ogmic due to the curves are the best-fit analytic function, and the dot-dashed curves assume the 
cosmic baryons, p(DM), insemi-analytic models and simulations,asencodedin —dpy=Fz scaling from the z=0.5 best-fit for which F= 0.32, 0.15 and 0.09 for 
black, blue, red and green in order of increasing redshift (z; see key), is the top, middle and bottom panels, respectively. Because of the success of this 
compared to the analytic form used in our analysis (DM; equation (4)). The Euclidean-space scaling, we adoptitin our analysis. The thicker green solid 


thinner solid curves show semi-analytic models” in which the minimum halo curve inthe bottom panel is calculated froma hydrodynamic simulation™. 
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Extended Data Fig. 4| The density of cosmic baryons derived fromthe 


0.10) 


distribution are shown in an identical manner to Fig. 3, but derived using the 
seven-burst sample (that is, including the five gold-standard bursts as well as 


extended FRBsample. The constraints on the IGM parameters 0,h,,and F, and 


onthe host-galaxy parameters wand o,,,,,, for alog-normal host-galaxy DM FRBs 190523 and 190611). 
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Extended Data Fig. 5 | Constraints on the cosmic baryon density and FRB 
host-galaxy parameters derived using a Bayesian approach. The results ofa 
Markov Chain Monte Carlo (MCMC) analysis based on our five-FRB 
gold-standard sample presented in the main text demonstrate broad 
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agreement with the results of the frequentist analysis presented in Fig. 3. The 
outermost vertical lines on the histogram denote the confidence region 
corresponding to that parameter, with the central line indicating the mean 
value. 


Extended Data Table 1| Detection properties of the ASKAP FRBs 


FRB 


detection S/N“) 


tres 
ms 


DMcutott 
(pecm7?) 


180924 
181112 
190102 
190608 


1906112) 


190711 


21.1 
19.3 
14.0 
16.1 
9.3 

23.8 


0.864 
0.864 
0.864 
1.728 
1.728 
1.728 


The values of DM uo denote the maximum DM at which a burst with those properties listed would have been detectable at an S/N threshold s, = 9:5 with the ASKAP telescope back-end at a 
centre frequency of 1,295 MHz given the burst width w and search time resolution t,,,, and its 1 MHz spectral resolution. 

*The detection S/N value listed is that reported by the incoherent detection pipeline for the telescope beam in which the detection signal was strongest. 

’The voltage-capture system enables the follow-up of subthreshold events detected in the incoherent pipeline, and subsequent interferometric validation, which would increase the S/N of a 
valid event by a factor 25. The reported DMgutorr is referenced to the threshold s, = 9.0 relevant to the observing run during which this event was detected. 
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Extended Data Table 2 | Results of the MCMC analysis 


Parameter Unit Prior Median 68% 95% 
F None U(0.011,0.5) 0.31  0.15,0.44 0.04,0.49 
exp(/t) pecm? ——-U(20.0,200) 68.2 33.2,127.8  22.0,181.1 
Chost None U(0.2,2) 0.88 0.43,1.53 0.24,1.91 
OQpxhzo None U(0.015,0.095) 0.056 0.046,0.066 0.038,0.073 


Estimates of the range of parameters from this analysis at the 68% and 95% confidence levels. These are consistent with the results of the approach described in the main text. 
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Molecular spectroscopy offers opportunities for the exploration of the fundamental 
laws of nature and the search for new particle physics beyond the standard mode 


y-4 


Radioactive molecules—in which one or more of the atoms possesses a radioactive 
nucleus—can contain heavy and deformed nuclei, offering high sensitivity for 
investigating parity- and time-reversal-violation effects**. Radium monofluoride, RaF, 
is of particular interest because it is predicted to have an electronic structure 
appropriate for laser cooling’, thus paving the way for its use in high-precision 
spectroscopic studies. Furthermore, the effects of symmetry-violating nuclear 
moments are strongly enhanced>”’ ’ in molecules containing octupole-deformed 
radium isotopes’*". However, the study of RaF has been impeded by the lack of stable 
isotopes of radium. Here we present an experimental approach to studying 
short-lived radioactive molecules, which allows us to measure molecules with 
lifetimes of just tens of milliseconds. Energetically low-lying electronic states were 
measured for different isotopically pure RaF molecules using collinear resonance 
ionisation at the ISOLDE ion-beam facility at CERN. Our results provide evidence of 
the existence of a suitable laser-cooling scheme for these molecules and represent a 
key step towards high-precision studies in these systems. Our findings will enable 
further studies of short-lived radioactive molecules for fundamental physics 


research. 


Molecular systems provide a versatile physical environment in which 
to study the fundamental symmetries of nature and the interactions 
and properties of subatomic particles’”””’. Among the four known 
fundamental forces, the weak force is the only one that is known to 
violate symmetry with respect to spatial inversion of all particle coor- 
dinates (known as parity violation), giving rise to various intriguing 
phenomena. Some of these parity-violating effects have been measured 
with high accuracy in atomic systems” ©, contributing to the most 
stringent low-energy tests of the Standard Model of particle phys- 
ics. In certain molecules, effects resulting from both parity violation 
(P-odd) and time-reversal violation (T-odd) are considerably enhanced 
with respect to atomic systems>’*"*, offering the means to explore 
unknown aspects of the fundamental laws of physics. The strengths of 
these interactions scale with atomic number, nuclear spin and nuclear 
deformation, and so molecular compounds of heavy radioactive nuclei 
are predicted to exhibit unprecedented sensitivity, with an enhance- 
ment of more than two orders of magnitude for effects that are P-odd 
or simultaneously P- and T-odd>*"°. 


However, the experimental knowledge of radioactive molecules 
is scarce”, and quantum chemistry calculations often constitute the 
only source of information. Molecules possess complex quantum level 
structures, which renders spectroscopy of their structure considerably 
more challenging compared to atoms. Moreover, major additional 
experimental challenges must be overcome to study molecules con- 
taining heavy and deformed nuclei, which can have lifetimes of just a 
few milliseconds. These radioactive nuclei are very rarein nature or do 
not occur naturally and so must be produced artificially at specialized 
facilities, such as at the Isotope Separator On-line Device (ISOLDE) at 
CERN. Furthermore, molecules containing short-lived isotopes can only 
be produced in quantities smaller than 10° g (typically with rates of 
less than 10° particles s“). Thus, spectroscopic studies require particu- 
larly sensitive experimental techniques adapted to the properties of 
radioactive ion beams and the conditions present at radioactive-beam 
facilities. Here, we present an approach for performing laser spectros- 
copy of short-lived radioactive molecules, using the highly sensitive 
collinear resonance ionization method”. These results provide the first 
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Fig. 1| Experimental scheme for the production and study of short-lived 
radioactive molecules. Radioactive radium isotopes were created by 
impinging 1.4-GeV protons from the CERN Proton Synchrotron Booster (PSB) 
onauranium carbide (UC,) target. Radium monofluoride cations (RaF*) were 
produced by passing tetrafluoromethane (CF,) gas through the activated UC, 
target at 1,300 °C. Molecular ions were extracted from the source, 
mass-selected and injected into a helium-filled RFQ trap, where they were 
accumulated for 10 ms. Bunches of molecular ions were extracted and 
neutralized in flight by charge exchange with neutral sodium atoms. Neutral 


spectroscopic information of RaF, including isotopologues composed 
of radioactive isotopes with lifetimes as short as a few days. To our 
knowledge, this is the first laser spectroscopy study performed on 
a molecule containing a short-lived isotope. Moreover, this experi- 
mental scheme can be applied to study other radioactive molecules, 
even those composed of isotopes with lifetimes as short as a few tens 
of milliseconds. 

Since the direct cooling of diatomic molecules with lasers?’ was 
experimentally demonstrated”, there has been a wealth of studies 
onlaser-cooling techniques and applications in molecular physics® >). 
In contrast to other heavy-atom molecules, RaF is predicted to have 
highly closed excitation and re-emission optical cycles, which would 
make it ideal for laser cooling and trapping®. Moreover, owing to the 
recently discovered pear-shaped nuclear deformation of certain radium 
isotopes", the interactions of the electrons with the P-odd nuclear ana- 
pole moment as well as with the P,T-odd nuclear Schiff and magnetic 
quadrupole moments are predicted to be enhanced by more thantwo 
orders of magnitude*>***. Hence, these molecules could provide a 
unique environment in which to measure these symmetry-violating 
nuclear moments. 


Experimental scheme 

Figure 1 shows a diagram of the experimental setup used to produce 
and study the RaF molecules. Asa first step, radium isotopes were pro- 
duced by diffusion out of an irradiated target (see Methods section 
‘Production of RaF molecules’). RaF* molecular ions were formed upon 
injection of CF, gas into the target environment. The molecular ions 
were extracted from the ion source by applying an electrostatic field, 
and molecules containing one specific radium isotope were selected 
with a high-resolution magnetic mass separator (Am/m = 1/2,000). 
The ions were collisionally cooled in a radio-frequency quadrupole 
(RFQ) trap filled with helium gas at room temperature (about 300 K). 


RaF molecules were overlapped with different laser beams (step 1, TiSa, Dyel 
and Dye2, and step 2,a355-nm laser; see Methods section ‘Laser setup’) ina 
collinear geometry. Resonantly reionized molecules were deflected ontoa 
particle detector. The resonance ionization scheme is shown at top right. At 
bottom, molecular orbitals are shown schematically. Nuclear positions within 
the molecules are coarsely indicated by a grey sphere (Ra) and green sphere (F), 
and the sigma bond between the atoms is indicated by the grey cylinders. 
Further details are provided in ‘Experimental scheme’. 


After up to 10 ms of cooling time, bunches of RaF* with a 4-p1s tem- 
poral width were released and accelerated to 39,998(1) eV, before 
entering into the Collinear Resonance lonisation Spectroscopy 
(CRIS) setup””*?**. At the CRIS beam line, the ions were first neutral- 
ized in-flight by passing through a collision cell filled with a sodium 
vapour, inducing charge exchange according to the reaction RaF’ + Na> 
RaF + Na’. As the ionization energy of RaF is estimated to be close to 
that of sodium (5.14 eV)®, the neutralization reaction dominantly popu- 
lates the RaF X?2* electronic ground state. Molecular pseudo-orbitals 
obtained from one-component open-shell (neutral) or closed-shell 
(ion) restricted Hartree-Fock calculations with an energy-consistent 
effective core potential on radium are shown schematically in Fig. 1 (bot- 
tom). The lowest unoccupied molecular orbital in RaF*, whichis mainly 
of non-bonding character, becomes occupied by an unpaired electron 
(symbolizedin Fig. 1 bya red sphere together with an arrowrepresenting 
the electron spin) upon neutralization. This is shown schematically as an 
isodensity, with lobes in slightly transparent blue and transparent red 
indicating different relative phases of the single-electron wavefunction. 

After the charge-exchange reaction, non-neutralized RaF* ions 
were deflected out of the beam, and the remaining bunch of neutral 
RaF molecules was overlapped in time and space by several (pulsed) 
laser beams ina collinear arrangement, along the ultrahigh-vacuum 
(10° mbar) interaction region of 1.2-m length. Laser pulses (step 1) 
of tunable wavelength were used to resonantly excite the transition 
of interest, and a high-power 355-nm laser pulse (step 2) was used to 
subsequently ionize the excited RaF molecules into RaF" (see Fig. 1, 
top). The resonantly ionized molecules were then separated from the 
non-ionized molecules by deflecting the ions onto a particle detec- 
tor. When the excitation laser is on resonance with a transition in the 
molecule (step 1 in Fig. 1), the second laser pulse ionizes the molecule, 
producing a signal at the detector. Molecular excitation spectra were 
obtained by monitoring the ion counts as a function of the wavenumber 
of the first laser. 
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Fig. 2| Examples of vibronic spectra measured for 7*°RaF. a-f, The counts on 
the particle detector were measured as a function of the laser wavenumber of 
the resonant step. A fixed wavelength (355 nm) was used for the ionization step. 
a, The observed peaks corresponding to the vibronic spectra of the Av=0 band 
system ofv”=0,1,2,3,4,scanned by the grating Ti:sapphire laser. b,c, The 
pulsed dye laser was used to scan electronic transitions in different wavelength 
ranges: the Av=+1 band system of the A7I,. X72" transition with v”=0, 1,2, 3,4 
(b) and the (v’, v”) = (0, 1) and (1, 2) band. d-f, The corresponding transitions to 
other electronic states: A’M3,. < X72’ (d), B?A3,. < X72’ (tentatively assigned; e) 
and C?Z‘ < XZ (f). The shape of the spectra is due to population distribution of 
different rotational states. The solid lines show the fit with skewed Voigt 


Only theoretical predictions were available for the excitation ener- 
gies of RaF, and so finding the transition experimentally required 
scanning a large wavelength range (>1,000 cm’). The prediction for 
the A7I,,.-X’2" (0, 0) transition, for example, was 13,300 cm™, with an 
accuracy estimated to be within 1,200 cm‘ (refs. °”). Given the band- 
widths of the commonly available lasers (<O.3 cm“), the scan of sucha 
large wavelength region on samples produced at rates below 10° mol- 
ecules st represented a major experimental challenge. To optimize the 
search of molecular transitions, three broadband lasers were scanned 
simultaneously and both collinearly and anti-collinearly (see Methods 
section ‘Laser setup’). 


Results 


The predicted region for the A7I1,. < XZ" transition was scanned at a 
speed of 0.06cm's“, covering a range of 1.000 cm‘ in about 5h, using 
the six simultaneously applied scanning regions. After a few hours of 
scanning ona beam of “Raf, a clear sequence of vibronic absorption 
signals was recorded. The measured spectrum assigned to the (v’, v”) 
vibrational transitions (0, 0), (1,1), (2, 2), (3,3) and (4, 4) of the A*N.- 
X’Z* band system is shown in Fig. 2a. Weaker band structures, that were 
found at about +440 cm and -440 cm ‘with respect to the (0, 0) band, 
were assigned tothe Av= +1 transitions (v’, v”) = (1, 0), (2, 1), (3, 2), (4,3), 
(5, 4) and (v’, v”) = (0, 1), (1, 2), respectively (Fig. 2b, c). The quantum 
number assignment for Av=—lis tentative, owing to the highly dense 
structure of overlapping vibronic bands. 

In addition to the A7M,,.-X’" band system, we found spectroscopic 
signatures of electronic transitions to higher-lying states. Some 
examples of recorded spectra are shown in Fig. 2d-f, along with the 
energy-level scheme. We assign the observed transitions as follows: 
1) The band system around 15,325 cm" (Fig. 2d) is attributed to the 
A’Ml;-X?Z* transition, owing to the complex rovibrational structure 
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profiles. g, Scheme of the molecular energy levels. The estimated upper limit of 
the ionization potential (IP) is indicated. Three essential properties for laser 
cooling of RaF molecules were identified: 1) the short lifetime of the excited 
states My) (Ty. < 50 ns), which will allow for the application of strong optical 
forces; 2) dominant diagonal transitions, (Av=0)/(Av=+1, Av=0) >0.97, 
indicating alarge diagonal Franck-Condon factor; and 3) the expected 
low-lying electronic states B*A;,., A7I3,. and C72‘ were found to be above the 
A’I1,. States, which will enable efficient optical-cooling cycles. Wavenumbers 
in the spectra are given in the rest frame of the molecule. Ina-f, the error bars 
show the statistical uncertainties (1 standard deviation) for the number of 
resonantly ionized molecules obtained within each laser frequency interval. 


expected to arise from the intense satellites that are possible in these 
transitions. Because the bands are comparatively strong, they are 
assigned to the Av=0 band system. Although the individual assign- 
ments to vibrational transitions must be considered to be tentative, 
as per the congested structure of the Franck-Condon profile, the 
Av =O assignment is substantiated because no additional structure 
was located within a relative range of -400 to +400 cm“. The band 
system located around 15,143 cm’ (Fig. 2e) is tentatively assigned to 
the B’A,,.-X’2' transition by virtue of the good agreement with the 
computed excitation energies to the O = 3/2 state of mixed A/N char- 
acter®”. This mixing provides intensity to the one-photon transition 
froma state into the A manifold. The computed Born—Oppenheimer 
potentials for this O = 3/2 state and the electronic ground state are, 
however, highly parallel, which would suggest a sparser Franck-Condon 
profile than was observed experimentally. However, we note that the 
related B’A;,.-X”2’ transition in BaH and BaD was reported to have a 
perturbed character owing to mixing between electronic levels*®. Thus, 
in the present case, a vibrational profile that is richer than expected 
from adiabatic potentials cannot be ruled out a priori. The band system 
with origin at 16,175 cm” (Fig. 2f) is assigned to the C?X*-X’>" transition 
on the basis of the observed Franck-Condon profile, which is in good 
agreement with the computed harmonic vibrational energy spacings 
as well as the expected intensity distribution, and is ina wavenumber 
region that is only slightly lower than predicted®”. All measured and 
assigned vibronic bands of the four electronic transitions are listed 
in Table 1. 

The measured A7I,.-X7Z* (0, 0) band centre, 7, = 13,287.8(1) cm tis 
in excellent agreement with the ab initio calculated value of 
13,300(1,200) cm" (ref. *). Inaccordance with theoretical predictions®, 
we found vibronic transitions with Av = 0 to be much stronger than 
those with Av= +1. For most of the measurements, the power density 
used for the resonant step was 100(5) J cm” per pulse, as measured 


Table 1| Measured vibronic transitions of 77°RaF from the X2Z* 
electronic ground state to the excited A7M and B7A states 


Table 2 | ”°RaF Morse potential parameters for X2Z* 
electronic ground and A7M1,,. excited states 


Transition vev" AV (cm") 
AT p< Xx" 0-0 13,284.7(5) 
HI 13,278.5(5) 
2-2 13,272.4(5) 
3-3 13,266.4(10) 
4-4 13,260.2(10) 
1-0 13,716.9(5) 
2-1 13,707.4(5) 
3-2 13,698.0(5) 
4-3 13,688.6(10) 
5-4 13,679.4(10) 
(0-1) 12,846.3(10) 
(1-2) 12,843.1(10) 
(B7Agjp < X22") 0-0 15,142.7(5) 
HI 15,132.8(10) 
2-2 15,123.0(10) 
3-3 15,113.2(10) 
Aaj < X22" (0-0) 15,344.6(50) 
(1-1) 15,325.0(80) 
(2-2) 15,309.4(100) 
C?x* < X?x* 0-0 16,175.2(5) 
HI 16,164.2(5) 
2-2 16,153.4(5) 
3-3 16,142.4(10) 


The values indicate the band head positions. 
Combined statistical and systematic uncertainties are included in parentheses. 
The B’A3). < X’" assignment is tentative. 


at the entry window of the beam line. Reducing the power by 50% did 
not reduce the resonant ionization rate, indicating that these transi- 
tions were measured well above saturation. The much weaker vibra- 
tional transitions with Av = +1 were scanned with a pulsed dye laser of 
500(5) J cm power density per pulse (bandwidth of 0.1cm"). The 
Av=+1transitions were measured well above saturation and with laser 
beams of different characteristics, and so a precise estimation of the 
Franck-Condon factors could not be obtained. Instead, a lower limit 
of 0.97 for the peak intensity ratio (0, 0)/(O, 1) was derived, indicating 
highly diagonal Franck-Condon factors, an essential property for laser 
cooling®. 

By measuring the resonant ionization rate for different time delays 
between the excitation and ionization laser pulses, we obtained an 
upper limit for the lifetime of the excited state 7M, (v’ = 0): Ty.<50ns. 
The measurements were performed with the wavenumber of the reso- 
nant laser fixed at the resonance value of the transition (v’, v”) = (0,0). 
The resonant ionization rate dropped by more than 70% for delays 
above 50 ns. This short lifetime corresponds to a large spontaneous 
decay rate (>2 x 10’s), which would allow for the application of strong 
optical forces for laser cooling. An additional concern for the suitability 
of laser cooling is related to the existence of metastable states lying 
energetically below the 7I1,. level, which could prevent the applica- 
tion of a closed optical-cooling loop, a major problem encountered 
for BaF (ref®.). In contrast to BaF, all other predicted electronic states 
(7M3/2,7A3. and Z) in RaF were found to be energetically above the 7M, 
state, indicating that its electronic structure will allow for efficient 
optical-cooling cycles. 

From combination differences of energetically low-lying vibronic 
transitions in the band system A7Il,,.-X72*, we have derived 


Parameter @, (cm") D, («10 cm") 
om 441.8(1) 2.92(5) 
AThp 435.5(1) 2.90(3) 


experimental values for the harmonic frequency, @,, and the dissocia- 
tion energies, D,, using a Morse potential approximation. Results are 
given in Table 2, and further details of the analysis can be found in Meth- 
ods section ‘Spectroscopic analysis’. 

Furthermore, we measured the AI. < XX" vibronic spectra of 
?26RaF and the short-lived isotopologues *’RaF, Ra, RaF, and*"RaF 
(Fig. 3). All vibrational transitions were clearly observed, including 
those of the molecule with the shortest-lived radium isotope studied, 
24RaF (Ty. = 3.6 d). An on-line irradiation of the target material will 
enable the study of molecules containing isotopes with lifetimes as 
short as a few tens of milliseconds. The main limitation is dictated by 
the release from the target and the time spent in the RFQ trap (>5 ms). 
Future high-resolution measurements will enable studies of nuclear 
structure changes resulting from different isotopes and nuclear spins. 


Conclusions and future perspectives 


Insummary, this Article presents an experimental approach for per- 
forming laser spectroscopy studies of molecules containing radioactive 
nuclei, which are typically produced at rates lower than 10° mol- 
ecules s‘. Our results have established the energetically low-lying 
electronic structure of RaF, providing experimental evidence for the 
suitability of this diatomic molecule in a laser-cooling scheme. These 
findings are a pivotal step towards precision measurements in this 
system, which are expected to provide a highly sensitive environment 
for the exploration of physics beyond the Standard Model of particle 
physics. 


228RaF 


224 
Typ =11.4d nal 


225) 
Tip =3.6d Aas 


Typ =14.9d °?8RaF 


Typ = 1,600 y 


228RaF 
Typ=5.7Y 


100 


a 
ro) 
Counts (a.u.) 


223 


R, 225 
& (atomic mass) 226 


228 


Fig. 3| Vibronic spectra measured for different isotopologues of RaF. 
Measured vibronic absorption spectra for the A’I1,. < XZ’ transition are 
shown for the isotopologues *RaF, *RaF, *RaF, 7*°RaF and “*RaF. 
Wavenumber values are relative to the transition (0, 0) of ?°RaF. 
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Our experimental scheme can also be used to perform laser spec- 
troscopy of a wide variety of neutral molecules and molecular ions, 
including those composed of isotopes with lifetimes of a few tens 
of milliseconds. Radioactive molecules can be precisely tailored to 
enhance their sensitivity to parity- and time-reversal-violating effects 
by introducing heavy and octupole-deformed nuclei. Moreover, by 
systematically replacing their constituent nuclei with different 
isotopes of the same element, both nuclear-spin-independent and 
nuclear-spin-dependent effects can be comprehensively studied. In 
addition, the present technique is applicable to other molecules of 
interest in studies of fundamental physics that are as yet experimen- 
tally unexplored, such as RaOH (ref. ””), RaO (ref. 8), RaH (ref. ”), ACF 
(ref. °°) and °ThO (ref. *). 

In addition to the impact of our findings on quantum chemistry, 
nuclear structure and fundamental physics research, the ability to 
produce, mass-select and spectroscopically study short-lived radio- 
active molecules is of importance to other fields of research such as 
radiochemistry” and astrophysics*“°. Laboratory measurements of 
the spectra of radioactive molecules of astrophysical interest will allow 
their unambiguous identification in future astronomical observations. 
Furthermore, the possibility of performing spectroscopy on fast molec- 
ular beams will enable sub-Doppler spectroscopy to be performed even 
onmolecules created at high temperatures (>600 K). Thus, we expect 
our results will motivate further avenues of research at the increasingly 
capable radioactive-ion-beam facilities around the world. 
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Methods 


Production of RaF molecules 

Ra isotopes were produced 33 d before the laser-spectroscopy measure- 
ments by impinging 1.4 GeV protons on the cold UC, target material. 
The target was exposed to pulses of 10” protons per pulse over a period 
of 2 d. After irradiation with a total of 8 x 10” protons, the target was 
keptinasealed chamber filled with Ar gas. After day 33, the target was 
connected to the High-Resolution Separator (HRS) front-end at ISOLDE. 
FLUKA“ simulations predicted 2 x 10" atoms of ”*Ra in the target mate- 
rial (7.5 10° g), following proton irradiation of acold target. The target 
was pumped downto pressures below 10° mbar, and the target holder 
and ion source were gradually heated up to about 1,300 °C, in order for 
the Ra isotopes to diffuse towards the surface of the target material. A 
leak valve attached to the target was used to inject CF, into the target 
environment. The CF, molecules dissociate and react with atoms and 
molecules on the target surface until an equilibrium is reached. RaF 
molecules were formed by reactive collisions of CF, molecules with 
Ra atoms present inside the irradiated target material. 

According to thermodynamic equilibrium calculations”, RaF, or 
RaF are expected to form, depending onthe local temperature. Within 
the temperature gradient between the target (1,300 °C) and the ion 
source (2,000 °C), RaF, fully reacts to form RaF. A measured ratio of 
the ion-beam intensity of Ra* to RaF* of less than 0.05 indicates that 
more than 95% of the Raisotopes released from the target material are 
converted and extracted as molecules. 

The ”°RaF*(A = 245) beam extracted from the ISOLDE target unit was 
sent to the ISOLTRAP setup®, where the molecular ions were captured, 
cooled and bunched by a different RFQ trap and subsequently analysed 
using a multi-reflection time-of-flight mass spectrometer**. A measured 
mass spectrum is shown in Extended Data Fig. 1. After 1,000 revolu- 
tions in the device, a mass resolving power (R = m/Am) of 1.7 x 10° was 
achieved, which allowed the isobaric beam composition to be analysed. 
The only mass peak detected was identified as the signal of ”°Ra’F*, 
confirming the purity of the beam from ISOLDE. 

The intensity of RaF* molecules depends strongly on the target and 
ion source temperature. For a target temperature of 1,300 °C, amean 
value of 2 x 10’ molecules s* of °RaF* was measured after the mass 
separator. Depending on the molecular mass and beam intensity, 
the transmission efficiency through the RFQ trap varied from 15% to 
30%. The ion-beam transmission from the ion trap to the interaction 
region was measured to be 25(5)%. The charge exchange cell vapour 
was heated to produce a measured neutralization rate of 30(5)%. Thus, 
we estimate that on average 5 x 10‘ neutral ”°RaF molecules s were 
delivered to be resonantly excited. From the analysis of the measured 
spectra it was concluded that the neutral molecules populate the 
low-lying vibrational states v= 0, 1, 2, 3, 4 following a relative popula- 
tion of 0.47:0.29:0.13:0.05:0.03. Resonantly ionized molecules with 
rates of the order of 10° counts s ‘at the peak of the O < O transition 
were measured at the particle detector. Future production of RaF* 
molecular rates of the order of 10°-10" molecules sis feasible using 
active proton irradiation®. 


Laser setup 

The resonance ionization schemes used for the study of RaF molecules 
are shown in Fig. 1. Three different laser systems were prepared to cover 
the scanning range from 12,800 cm to 13,800 cm: 1) A dye-laser sys- 
tem (Dyel; Spectrolase 4000, Spectron) provided pulses of 100(S5) uJ 
witha linewidth of 10 GHz (0.3 cm”). 2) A dye laser (Dye2; Cobra, Sirah) 
with a narrower linewidth of 2.5 GHz (0.09 cm”) produced pulses of 
similar energy. The lasers were loaded with either Styryl 8 or DCM 
dyes to provide wavenumber ranges 12,800-14,000 cm ‘and 15,150- 
16,600 cm", respectively. Both dye lasers were pumped by 532-nm 
pulses at 100 Hz, obtained from two different heads of a twin-head 
Nd:YAG laser (LPY 60150-100 PIV, Litron).3) A grating Ti:sapphire laser 


system with a linewidth of 2 GHz (0.07 cm”) produced pulses of 20(1) J, 
pumped by 532-nm pulses at 1 KHz from a Nd:YAG laser (LDP-IOOMQ, 
LEE Laser). The non-resonant ionization step was obtained by 355-nm 
pulses of 30 mJ at 100 Hz, produced by the third-harmonic output of 
a high-power Nd:YAG laser (TRLi 250-100, Litron). 

The release of the ion bunch was synchronized with the laser pulses 
by triggering the flash-lamps and Q-switch of the pulsed lasers with a 
digital delay pulse generator (Quantum Composers 9528). 

The dye-laser wavelengths were measured with a wavelength meter 
(WS6-600 HighFinesse) and the Ti:sapphire laser wavelengths were 
measured by a wavelength meter (WSU-2 HighFinesse) calibrated by 
measuring a reference wavelength provided by a stabilized diode laser 
(DLC DL PRO 780, Toptica). 


Collinear and anti-collinear excitation 

For the initial peak searching, a zero-degree mirror at the end of the 
beam line was used to reflect the laser light anti-collinearly with respect 
to the travelling direction of the RaF bunch. Thus, each scanning laser 
covered two different wavenumber regions inthe molecular rest frame, 
owing to the Doppler shift present for the fast RaF molecules. Fora 
molecule travelling at velocity v, the laser wavenumber in the labora- 
tory frame, Vo, is related to the wavenumber inthe molecule rest frame, 


v, by the expression f= ae Vo, with B = v/c (c, speed of light in 


vacuum) and where @ is the angle between the direction of the laser 
beam and the velocity of the molecule. For RaF molecules at 39,998(1) eV 
(v=0.18 m ps”), a difference of 15.7 cm ‘is obtained between the laser 
pulse sent out collinearly (cos@=1) and anti-collinearly (cos@=—1) with 
respect to the direction of the velocity of the molecule. 


Spectroscopic analysis 
The peaks in the different spectra were identified by rebinning the 
spectra using coarse bin sizes with values up to 1 cm“. Only groups of 
data points that were consistently observed with a 5-sigma significance 
above background were considered as candidates for transitions. The 
vibrational transitions in Fig. 2 show asymmetric line profiles witha 
maximum located towards higher wavenumbers. The band centres 
cannot be determined directly from the measured line profiles, andso 
we used the wavenumber positions of the maxima in our data analysis. 
Extended Data Table 1 lists the maximum peak positions and estimated 
uncertainties are given in parentheses. The wavenumber difference, 
AV, of vibrational levels in the electronic 7X* ground state and in the 
*T,. excited state were derived from combination differences of the 
recorded **RaF spectra (see Extended Data Table 1). 

In our analysis we used vibrational energy terms £,/(hc) of aMorse 
potential according to: 


~2 2 
E,hc) -a,[v+ 5] - “ [v+ 5) (1) 


Energy-level differences 


Ge 


2D, (v+}) (2) 


(Eps E,)/(AC) = Oe — 


were used to derive the Morse potential parameters @, and D, froma 
least-squares fit analysis. The derived energy-level differences are given 
in Extended Data Table 1, whereas Extended Data Table 2 contains the 
molecular parameters from the fit. The harmonic vibration frequencies 
@, of the Z* and 7M, states are almost identical and correspond well 
to the theoretical predictions with a deviation of less than 5%; see 
Extended Data Table 2. The same holds for the estimated dissociation 
energy P., which is in better agreement with the values of ref. °, as 
therein also the low-energy part of the potentials was used to estimate 
the dissociation energy. 
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In the case of the two low-lying 7M fine-structure levels, the 
observed origins 7o,. agree well with the calculated values based 
on the Relativistic Correlation Consistent — Atomic Natural Orbital 
(RCC-ANO) basis set. From the energy difference of the fine-structure 
components the effective spin-orbital coupling parameter A is 
derived. For the 7M states, the experimental value of 2,068(5) cm™ 
is in good agreement with the calculated value. The band origins are 
in reasonable agreement with results from the RCC-ANO basis set 
calculation, if one attributes the O = 3/2 levels, which were compu- 
tationally found to be of mixed I;,. and A;,, character in this order 
of energies. A reverse assignment also gives better agreement with 
experiment. Calculations of the gas-phase bond lengths, dissocia- 
tion energies and additional properties of RaF molecules have been 
reported®”*, 


Data availability 


Examples of vibronic spectra measured for RaF molecules are included 
as source data with this Article. All other relevant data supporting the 
findings of these studies are available from the corresponding author 
upon request. 
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Extended Data Fig. 1| Time-of-flight spectrum measured at mass A =245. 
The time-of-flight spectrum of the°RaF* (A =245) beamas delivered from 
ISOLDE after 1,000 revolutions in the multi-reflection time-of-flight mass 
spectrometer. A mass resolving power of 1.7 x 10° was achieved, which allowed 
the isobaric beam composition to be analysed. Only 7°RaF* ions were 
detected. The positions of the most probable accompanying ions are 
highlighted by dotted vertical lines. 
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Extended Data Table 1| 77°RaF vibrational transitions in the 
electronic X7Z* ground state and the A7I1,,. excited state 
derived from combination differences 


yey” 2yt Ad/cem™* "Thi. Av/cm7* 
1-0 138.4(7) 132.2(7) 
2-1 435.0(7) 428.9(7) 
2-1 435.4(11) 429.3(11) 
$59 431.6(11) 425.6(7) 
4-3 428.4(14) 422.2(14) 
5-4 a 419.2(14) 


Combined statistical and systematic uncertainties are given in parentheses. 


Extended Data Table 2| Molecular parameters of RaF from vibrational analysis of the electronic ground state (X2Z") and 
excited states (A7M, B7A and C7") 


@e fem" T. /10* cm7} A/10°cm7! D./10*cm=! Ref 
x 441.8(1) 2.92(5) this work 
432 308 43], theo.* 
431 4.26 43], theo.” 
A*Tly/2/7II3/2  435.5(1)/419.1(2) —1.32878(1)/1.53554(3) —-2.0676(36) 2.90(3) /- this work 
428/410 1.40/1.60 2.0 3,13/- 43], theo.” 
428/415 1.33/1.50 17 43], theo.” 
B?A3/2/7As5/2 431.9(2) /- 1.51477(2)/- -/- 2.83(11)/- this work 
432/419 1.64/1.71 0.4 43], theo.* 
431/423 1.54/1.58 0.2 43], theo.” 
Cs 430.9(2) 1.61806(1) 2.78(9) this work 


“These calculations used Fock space coupled cluster singles and doubles, the Dyall basis set and a smaller active space. 
’These calculations used Fock space coupled cluster singles and doubles, the RCC-ANO basis set anda larger active space. 
Experimental results are compared with theoretical predictions”, theo. 


For a direct comparison, theoretical values given for @e should be scaled by ,/m,/u = 1.0036 to account for the atomic mass constant instead of the proton mass, m, in atomic mass units (u). 
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Plasmonics enables the manipulation of light beyond the optical diffraction limit’ * 
and may therefore confer advantages in applications such as photonic devices” ’, 
optical cloaking®’, biochemical sensing 


10,11 12,13 


and super-resolution imaging 


However, the essential field-confinement capability of plasmonic devices is always 
accompanied by a parasitic Ohmic loss, which severely reduces their performance. 
Therefore, plasmonic materials (those with collective oscillations of electrons) witha 
lower loss than noble metals have long been sought’* “*. Here we present stable 
sodium-based plasmonic devices with state-of-the-art performance at near-infrared 
wavelengths. We fabricated high-quality sodium films with electron relaxation times 
as long as 0.42 picoseconds using a thermo-assisted spin-coating process. A 
direct-waveguide experiment shows that the propagation length of surface plasmon 
polaritons supported at the sodium—quartz interface can reach 200 micrometres at 
near-infrared wavelengths. We further demonstrate a room-temperature 
sodium-based plasmonic nanolaser with a lasing threshold of 140 kilowatts per square 
centimetre, lower than values previously reported for plasmonic nanolasers at 
near-infrared wavelengths. These sodium-based plasmonic devices show stable 
performance under ambient conditions over a period of several months after 
packaging with epoxy. These results indicate that the performance of plasmonic 
devices can be greatly improved beyond that of devices using noble metals, with 
implications for applications in plasmonics, nanophotonics and metamaterials. 


Of the plasmonic materials, the noble metals, particularly silver and 
gold, are those most often used owing to their relatively low loss. 
However, the optical loss of the two metals is still not commercially 
acceptable and has been the primary limiting factor for the widespread 
applications of plasmonics' ”. Therefore, there has been a persistent 
search for low-loss alternatives °, such as crystalline metals, inter- 
metallic composites, metal alloys, nitrides and oxides. 

Of these alternatives, the alkali metals, such as sodium, have long 
been regarded as ideal plasmonic materials”, primarily because of 
their low intraband damping rate. When light interacts with a plas- 
monic metal, it suffers from scattering damping of the intraband tran- 
sition of electrons (y,)", and the total intraband damping rate can be 
obtained by: 


Y= Ve-p* Yo-e t Ye-i» (1) 


where py,» Ye-e and y,_; are the optical damping rates that originate 
from electron-phonon scattering, electron-electron scattering and 
electron-impurity scattering, respectively. The overall intraband 


optical loss y, of sodium is estimated to be around 0.010 eV (Supple- 
mentary Information), corresponding to a relaxation time of 0.42 ps. 
For comparison, silver has an intraband optical loss of 0.021 eV and a 
relaxation time of 0.20 ps, based on the Drude-Lorentz model"®. In addi- 
tion, sodium has electron gases witha density of 2.65 x 10” cm? (ref.”), 
which is approximately half that of silver, which is another important 
factor contributing to a decreased optical loss. 

Although sodium has been predicted to be an ideal plasmonic mate- 
rial for years, the experimental exploration of sodium as a plasmonic 
material has been limited, apart from the early measurement of its 
optical constants””° and the demonstration of localized plasmon reso- 
nances in alkali metal nanoparticles precipitated within crystal matri- 
ces”. Because of its high chemical reactivity, fabricating sodium-based 
structures using conventional metal-deposition techniques, such as 
physical deposition, has been challenging”. 

Here, by taking advantage of the low melting point of sodium, we 
have developed a thermo-assisted spin-coating process for fabricat- 
ing asodium film, as shown in Fig. la (optical images of the procedure 
in Extended Data Fig. 1a). The sodium metal is heated up to 160 °C to 
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Fig. 1| Sodium film fabricated by athermo-assisted spin-coating process. 
a, Schematic of the process flow. A piece of sodium is heated and melted while 
inner impurities diffuse towards the surface during this thermal annealing 
process; the thin oxide shell incorporating the impurities covers the melted 
sodium droplet and is then peeled off; the cleaned sodium dropletis 


forma droplet. During this thermal annealing process, impurities dif- 
fuse towards the surface, which can then be peeled off to purify the 
sodium droplet. The purified sodium droplet, which has a luminous 
appearance, is then spin-coated onto an ultrasmooth quartz substrate 
(roughness of about 0.1nm; Extended Data Fig. 1b). Once the sodium 
droplet touches the surface of spinning quartz, an ultrasmoothsodium 
film forms (Fig. 1b) owing to the strong centrifugal forces of the rotating 
quartz and fast solidification. The entire process is conducted inside 
a glove box with an inert atmosphere. The X-ray diffraction pattern 
shows that the sodium film produced is polycrystalline (Fig. 1c, where 
the sample is sealed with Surlyn, anionomer resin transparent to X-rays 
for a wide range of angles). 

To evaluate the optical properties of the prepared sodium film, we 
measured its dielectric function in the wavelength range 400-1,500 nm 
using a spectroscopic ellipsometer. Figure 2a depicts the measured 
dielectric functions versus wavelength of the sodium film (solid circle 
symbols), which exhibits a lower optical loss than reported values for 
silver over a wide range of wavelengths (see Extended Data Fig. 2 for 
comparison). 

To quantitatively analyse the loss mechanisms of the sodium film, we 
used a Drude-Lorentz model’*” to fit the measured dielectric curves 
(dashed lines in Fig. 2a), whichis expressed as: 

€(@) = £y~- @5/(wW* + iwy,) +f @t/(@z - 07 -iwy,), (2) 
where €, is the polarization response from the core electrons (back- 
ground permittivity), w, is the bulk plasma frequency, y, is the Drude 
damping rate, f, and w, are the amplitude and resonant frequency of 
the inter-band transition, respectively, and y, is the related interband 
damping rate. The fitting parameters for the sodium film are ¢, =0.500, 
@,=5.414 eV, 0, =2.945 eV, f, = 0.280, y,= 2.706 eV and y, = 0.010 eV. The 
relaxation time Tcan be determined by the Drude damping rate, which 
is 0.42 ps. Notably, compared to the bulk silver materials, the optical 
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spin-coated ontoa rotating quartz substrate; the sodium film then forms on 
the quartz surface witha smooth interface. b, Photograph of the spin-coated 
sodium film on quartz. Scale bar, 5 mm. ¢, X-ray diffraction pattern of the 
spin-coated sodium film. The sodium film is covered with Surlyn as packaging. 
XRD, X-ray diffraction; a.u., arbitrary units. 


damping rate of our sodium film is reduced by half and the relaxation 
time is doubled. (y, = 0.021 eV and t= 0.20 ps for silver’’). 

To further illustrate the reduced optical loss of our sodium film com- 
pared tosilver, we calculated the figure of merit (defined by —¢,/e,; ref.°), 
as shown in Fig. 2b. We chose two sets of silver data from recent publica- 
tions for comparison”. It can clearly be seen that our sodium film has 
state-of-the-art performance inthe near-infrared region of the spectrum. 

This low-loss sodium film provides an opportunity to improve 
the performance of plasmonic devices. We now demonstrate 
high-performance sodium-based plasmonic waveguides and plas- 
monic nanolasers at near-infrared wavelengths. A two-step fabrica- 
tion process—spin-coating of sodium and packaging—was conducted 
inside a glove box. All characterizations were carried out under ambient 
conditions, including the measurements of the dielectric function, 
plasmonic waveguiding and plasmonic lasing. 

Figure 2c shows the schematics of the sodium-based plasmonic wave- 
guide structure. The left coupler converts the incident laser beam 
to surface plasmon polaritons (SPPs) (Fig. 2d, left light spot), which 
propagate along the sodium-quartz interface and then become cou- 
pled to free space via the right coupler (Fig. 2d, right light spot). The 
intensity of the SPPs decreases exponentially as a function of distance. 
Various propagation separations were measured to fit the intensity 
decay curve (Fig. 2e), to obtain the propagation lengths (cpp) at dif- 
ferent wavelengths on the sodium-quartz interface (Fig. 2f). The inset 
to Fig. 2e depicts the scanning electron microscopy (SEM) image of 
the launching and outcoupling structures for the SPP propagation 
measurements (the fabrication process for the plasmonic waveguide 
devices is described in Extended Data Fig. 3). Itis clear that the sodium 
film supports a 6,pp of 200 pm at a wavelength of 1,500 nm, which rep- 
resents state-of-the-art performance for plasmonic waveguides (see 
Extended Data Fig. 4a—c for comparison with silver). 

In addition to the propagation length, the effective mode size— 
expressed as the sum of the decay lengths (skin depths) in the dielectric 
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Fig. 2 | Dielectric functions of sodium film and sodium-based plasmonic 
waveguides. a, Real and imaginary parts of the dielectric functions (€=¢€, + i€) 
of the sodium film (solid circles). Dashed lines show the Drude-Lorentz 
fittings. b, Figure of merit (-e,/e,) comparison for sodium (this work) and 
silver(from refs. 7>*). Comparisons with various data for silver in the literature 
are shown in Extended Data Fig. 4.c, Schematic of asodium-based plasmonic 
waveguiding structure with two nanostructured couplers. The coupling 
structures are pillar arrays. d, Optical image of the light spots of sodium-based 


6,and in the metal 6,,—is another essential parameter of the plasmonic 
waveguide, characterizing the field-confinement capability. There- 
fore, we can define the figure of merit of the plasmonic waveguide 
as the ratio of the propagation length over skin depth 65p,/(64 + 6,,)- 
The figures of merit of the plasmonic waveguide of sodium and sil- 
ver are plotted in Fig. 2g, further identifying sodium as a promising 
material of choice for plasmonic-waveguide-based applications at the 
near-infrared wavelengths (see Extended Data Fig. 4d for comparison 
with more data for silver). 

The plasmonic nanolaser is another widely studied plasmonic device 
for which a low-loss metal has long been sought in order to achieve a 
lower lasing threshold and power consumption’ *°. We fabricated 
sodium-based plasmonic nanolasers based on a metal-insulator-semi- 
conductor gap plasmonic mode configuration™”. As shownin Fig. 3a, 
b, the device consists of an InGaAsP multi-quantum-wells (MQWs) nano- 
disk ontop of sodium film with a 7-nm-thick Al,O, layer in between (see 
Extended Data Fig. 5 for fabrication details). Figure 3c shows the SEM 
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plasmonic waveguiding. The distance between the launching and output 
structures is 100 pm. e, Propagation measurements at wavelengths of 

1,180 nm, 1,350 nm and1,500 nm for sodium, with exponential curves fitted to 
the data. Inset, SEM image of the launching and outcoupling structures without 
sodium coating. Scale bar, 30 um. f, The fitted propagation length at different 
wavelengths on the sodium-quartz interface. The dashed line isa fit to the 
data. g, Figure of merit for plasmonic waveguides for sodium and silver, which 
is defined as the ratio of propagation length to skin depth 6spp/(64+6,,). 


image of an InGaAsP MQWs nanodisk without the sodium coating, with 
diameter and thickness 1.2 pm and 200 nm, respectively. Figure 3d-f 
shows the simulated electric-field distributions of the lasing plasmonic 
mode obtained by three-dimensional full-wave simulations. The electric 
field is strongly confined at the interface between InGaAsP and sodium, 
thus exhibiting a pronounced plasmonic feature. 

The sodium-based plasmonic nanolaser exhibits single-mode 
lasing with a low lasing threshold under optical pumping (Fig. 3g). 
Notably, pronounced resonance peaks appear in the spontaneous 
emission spectrum below the lasing threshold, which indicates the 
high quality factor Q of the sodium-based plasmonic nanocavity””™”. 
When we pump the device above the lasing threshold, a single lasing 
mode becomes dominant at 1,257 nm witha much narrower linewidth 
than spontaneous emission. The side-mode suppression ratio of this 
laser is about 20 dB (pump intensity 475 kW cm’; see Extended Data 
Fig. 6). We have identified this lasing cavity mode as the gap plasmonic 
whispering-gallery mode with an azimuthal order of /=15 and acold 
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Fig. 3 |Room-temperature sodium-based plasmonic nanolaser. 

a, b, Schematics of asodium-based plasmonic nanolaser in three dimensions (a) 
and two dimensions (b). c, SEM image of an InGaAsP MQWs nanodisk, without 
sodium coating. Scale bar, 500 nm. d, e, Top (d) and side (e) views of the 
electric-field distribution of the calculated lasing mode. The modeis a 


cavity quality factor of about 340 using full-wave simulation (see 
Extended Data Fig. 7). 

The threshold of the laser can be extracted from the evolution 
of the normalized spectra of the device versus the pump power 
and the S-shaped light-light curve (Fig. 4a, b). The threshold of the 
sodium-based plasmonic nanolaser is about 140 kW cm, which, to 
our knowledge, is the lowest reported value among near-infrared plas- 
monic nanolasers at room temperature (see Extended Data Table 1 and 
Extended Data Fig. 8). 

Owing to the high material dispersion, plasmonic modes pos- 
sess a higher group index than photonic modes. Here we reveal 
photonic-to-plasmonic mode jumping by tracking the group index 
of the lasing modes with decreased cavity diameter (Fig. 4c). Inthe 
experiment, we observe a sudden clear increase of the group index at 
a device diameter of about 2 um, which indicates that plasmonic mode 
lasing becomes dominant for smaller cavities. To further demonstrate 
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plasmonic whispering-galley mode where the field is strongly confined at the 
interface between the MQWs and sodium. f, Normalized electric-field profile 
along the white dashed line ine. g, Spectra of spontaneous emission (black), 
amplified spontaneous emission (red), and single-mode laser emission (green) 
of the sodium-based plasmonic nanolaser. 


the effect of sodium in lasing, we carried out acontrol experiment where 
bare InGaAsP MQWs nanodisks are optically pumped. However, for the 
devices with the same diameter as the one shown in Fig. 3, no lasing 
behaviour is observed till they are burned by the pump laser, which 
results from the larger radiation loss for the bare dielectric cavities™. 

One natural concern with regard to the use of sodium for plasmonic 
devices is its stability. It is therefore encouraging to find that these 
sodium-based plasmonic devices remain stable over a long period 
of time after the packaging of quartz and epoxy. More details of our 
extensive accelerated-ageing tests of sodium-based devices (high tem- 
perature and high air humidity) can be found in Extended Data Fig. 9. 
Our nanolaser devices remain functional at alow threshold even after 
six months (Extended Data Fig. 9g). 

In summary, we have demonstrated a method of fabricating 
high-quality sodium films. Sodium-based plasmonic waveguides and 
nanolasers have been realized and their high performance is verified 
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Fig. 4| Lasing characteristics of sodium-based plasmonic nanolasers. 

a, Normalized spectra of the sodium-based plasmonic nanolaser at different 
pump powers. b, Light-light curve of the plasmonic nanolaser. The circle 
symbols represent experimental data and the line isa fit to the data. 
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The substantially narrowed emission linewidth above the lasing threshold ina 
and the S-shaped light-light curve in b show a clear phase transition from 
spontaneous emission to lasing emission of the device. c, The group index 
versus device diameter estimated from the mode spacing in lasing spectra. 


by systematic comparison with the reported devices based on noble 
metals. Furthermore, we show that these sodium-based plasmonic 
devices can operate stably over several months. Our results provide 
an alternative pathway to low-loss plasmonic materials. The conveni- 
ent fabrication process and the state-of-the-art performance of the 
sodium-based plasmonic devices open up opportunities for advanced 
plasmonic applications. 
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Methods 


Thermo-assisted spin-coating process of low-loss sodium film 
Extended Data Fig. la shows the thermo-assisted spin-coating pro- 
cedure for the sodium film. The sodium brick was melted to form 
a droplet on a tungsten boat when it was heated up to 160 °C in an 
inert-atmosphere-equipped glove box. The oxide shell on the surface 
of the droplet can be peeled off using tweezers. The liquid sodium 
can then be dropped onto a fast-spinning quartz substrate ona spin 
coater. The quartz substrate is ultrasmooth (root-mean-square surface 
roughness of about 0.1 nm), as shown in Extended Data Fig. 1b. The 
sodium-glass interface was then prepared. The sodium film, once 
packaged with epoxy, can be transferred freely out of the glove box. 


Dielectric constant characterization 

The dielectric constant ¢ of the sodium film was measured using an 
ellipsometer (RC2 UI,J. A. Woollam) ranging from 400 nmto1,500nm 
(see Extended Data Fig. 2a). The incident light through the silica sub- 
strate can be separated into two light spots; the detector receives only 
the light from the metal-silica interface. We use the Lorentz model 
(equations (3)-(5)) to fit the measured data. For the ultraviolet pole 
and the infrared pole, ¢is expressed as 


Epole,n = CHE, = E”), (3) 


where C, (in units of eV’) and E£, (in units of eV) are fitting parameters, 
Fis the photon energy (in units of eV) and nis the oscillator number. 
For the general Lorentz oscillator, ¢ is expressed as 


ELorentz,m =AmBmEm/ (En ~ FP? a iFB,,). (4) 


Here A,, (unitless), B,, (in units of eV) and £,, (in units of eV) are the fitting 
parameters, where A,, approximately equals €, at its peak value and B,, 
is approximately the full-width at half-maximum. m is the oscillator 
number. In the full wavelength range, the overall dielectric functionis 


e(E) = £((E) ~ ic,(E) = eoffset + Epole,n + ELorentz,m’ (5) 


where Eofrser AN Epoie,, are real, and contribute only to ¢,. The oscillator 
functions are complex and therefore contribute to bothe, and €,. 


Comparison of real and imaginary parts of dielectric functions 
of sodium and silver 

Extended Data Fig. 2b, cshows a comparison of the dielectric functions 
of sodium and silver. Clearly, the optical loss (the imaginary part) of 
sodium is lower than that for silver, especially in the infrared region. We 
further compare the figure of merit (—¢,/e,) in Extended Data Fig. 2d. The 
data for silver are measurements taken from the literature”? 474, 


Fabrication of nanostructured sodium film 

Athin Ag film (about 30 nm) was evaporated onto the quartz (thickness 
about 0.2 mm) as a conductive layer using physical vapour deposition 
(Gatan 682), and double hole arrays were made on the quartz substrate 
via focused-ion-beam milling (Dual-beam FIB 235, FEI Strata). After the 
focused-ion-beam process, the Ag film on the surface of the quartz was 
removed with HNO,, followed by the thermo-assisted spin-coating 
process for the sodium film described above. 


Measurement of propagation of SPP mode on sodium-quartz 
interface 

The fabrication process for the plasmonic waveguide devices is shown 
in Extended Data Fig. 3. For effective coupling and decoupling of SPP 
along the sodium-quartz interface, the quartz substrate was first milled 
witha focused ion beam to generate periodic patterns at both coupling 
and out-coupling positions. The laser (Fianium SC-400-4 Compact) was 


concentrated onto the sodium nanostructures by an objective lens. 
Coupled-out signals was also captured by the objective lens. Finally, 
both input and output signals were collected with an infrared charge 
coupled device (XEVA-1083, Xenics) by a beam splitter. We measured 
several different propagation separations and fitted the intensity decay 
curve to unambiguously obtain the propagation lengths at different 
wavelengths on the sodium-quartz interface. 


Comparison of sodium plasmonic waveguides with silver-based 
ones 

We fabricated and measured the 6,,, of both sodium and silver plas- 
monic waveguide as shown in Extended Data Fig. 4a—c. The sodium 
supports longer 6spp than silver in the wavelength range >1 pm. It 
should be noted that, besides 6cpp, the effective mode size is another 
essential parameter of a plasmonic waveguide, characterizing the 
field-confinement capability. It is represented by the decay lengths (skin 
depths) inthe dielectric 6, and in the metal 6,,. Therefore, for evaluating 
the plasmonic waveguide, comparing 6,,, alone is not comprehensive, 
and the figure of merit of the plasmonic waveguide (6spp/(6g + 6m)) iS 
usually used. In Fig. 2g we benchmarked our sodium plasmonic wave- 
guide with the two silver-based references. To further clarify this point, 
we have calculated the figure of merit of the plasmonic waveguide for 
Ag plasmonic waveguides from the dielectric functions reported ina 
number of representative publications’”"°*** *”, From Extended Data 
Fig. 4d, one can unambiguously come into the conclusion that our 
sodium-based plasmonic waveguide has state-of-the-art performance. 


Detailed fabrication process for plasmonic nanolasers 

The detailed fabrication process of the plasmonic nanolasers is shown 
in Extended Data Fig. 5. Electron-beam lithography and inductively 
coupled plasma etching were used to pattern InGaAsP MQWSs nanodisks 
onto an epi-wafer (Extended Data Fig. 5a, b). 500-nm-thick SiO, was 
deposited on the nanodisks by chemical vapour deposition (Extended 
Data Fig. 5c). Then the whole sample was bonded onto quartz using 
a benzocyclobutene adhesive layer (Extended Data Fig. 5d), after 
which the InP substrate was removed by wet etching using HCI solu- 
tion. Atomic layer deposition was used to deposit 7-nm-thick Al,O, 
onto the nanodisk (Extended Data Fig. 5e). Finally, the sodium film 
was spin-coated onto nanodisks and the sample was packaged onto 
quartz by epoxy (Extended Data Fig. 5f). 


Optical characterization of plasmonic nanolasers 

Sodium-based plasmonic nanolasers were optically pumped by anano- 
second pump laser (1,064 nm, pulse length 5ns, repetition rate 12 kHz), 
focused onthe sample with an objective lens (SOx, numerical aperture 
0.42). The pump laser spot is about 7 pm in diameter. The emission 
from ananolaser was collected by the same objective and guided to an 
InGaAs infrared camera and a near-infrared spectrometer. All experi- 
ments were carried out at room temperature. 


Numerical simulation of the sodium-based plasmonic nanolaser 
We carried out three-dimensional full-wave simulations via commercial 
software COMSOL multiphysics (radio-frequency module) to calcu- 
late the optical modes of the nanolasers. The Q factors of the cavity 
modes were evaluated as Q=f,/8f, where f, and &frepresent the reso- 
nance frequency and the full-width at half-maximum of the resonance 
spectrum, respectively. 


Comparison of sodium-based plasmonic nanolasers with 
noble-metal-based ones 

Extended Data Table 1 shows a comparison of sodium-based plas- 
monic nanolasers with reported noble-metal-based ones operated 
in the near-infrared range*® ”. In the table, the feedback mechanism, 
physical size, gain medium, emission wavelength, pump condition, 
working temperature and lasing threshold in peak power density are 


listed for comparison. Extended Data Fig. 8 shows the lasing thresh- 
olds and physical volumes of reported room-temperature plasmonic 
nanolasers in the near-infrared range. 
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Extended Data Fig. 1| The thermo-assisted spin-coating procedure for the low-loss sodium film. a, Optical images of the procedure. b, Atomic force 
microscope topology of the smooth quartz surface, which shows that the root mean square (RMS) roughness is about 100 pm. 
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Extended Data Fig. 2 | Dielectric function measurements for sodium and 
comparison with silver. a, Schematic of ellipsometer measurements. The 
incident light transmits through the upper surface to the lower surface of the 
silica layer (contact with the sodium interface) and reflects back. The light 
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reflected fromthe sodium/glass interface can be detected. b, Comparison of 
real (a) and imaginary (b) parts as well as the figure of merit (—e,/e,) (c) of the 
dielectric functions of silver and sodium (this work). The data for silver are 
taken from the literature!”!873434-37, 
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Extended Data Fig. 3 | Detailed fabrication process of the plasmonic waveguides. FIB, focused ion beam. 
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Extended Data Fig. 4| Comparison of silver and sodium plasmonic 
waveguides. a, SEM image of the launching and outcoupling structures of 
plasmonic waveguides. b, The fitted 6.,, at different wavelengths on the quartz 
interface. The dashed lineisa fit to the data. c, Propagation measurements at 
wavelengths of 1,180 nm and 1,450 nm for sodium and silver, with the 
exponential curves fitted to the data. d, Figures of merit for sodium to silver 
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defined as the ratio of propagation length to skin depth 6gpp/(64+ 6,,). The 
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silver data are calculated using the dielectric function, taken from anumber of 
representative publications!”!873434 37, 
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Extended Data Fig. 5 | Detailed fabrication process of the sodium-based plasmonic nanolaser. BCB, benzocyclobutene; HSQ, hydrogen silsesquioxane; EBL, 
electron-beam lithography; ALD, atomic layer deposition; CVD, chemical vapour deposition. 
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Extended Data Fig. 6| Emission spectra (semi-logarithmic scale) of the plasmonic nanolaser shown in the main text for different pump powers. When the 
pump intensity is 475 kW cm”, the device shows single-mode lasing with a side-mode suppression ratio of 20 dB. 
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Extended Data Fig. 7 | Numerical simulation of the cavity modes of the 
sodium-based plasmonic nanolaser. a, Measured emission spectra of the 
plasmonic nanolaser around the lasing threshold, which exhibit two clear 
cavity modes at wavelengths of 1,257 nm and 1,343 nm respectively. 

b, Spectrum profile of the two cavity modes, where the resonant wavelength 
and the linewidth are obtained from the simulation. We assumed 
Lorentz-shape spectrum profiles here. The simulation shows that the cavity 
modes at wavelengths of 1,257 nm and 1,343 nmcan be identified as plasmonic 
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whispering-galley modes with azimuthal orders of 15 and 14, respectively, and 
the corresponding quality factors Q of these two modes are 340 and 236, 
respectively. c,d, Top (c) and side (d) views of simulated electric-field 
distributions of the plasmonic whispering-galley modes with azimuthal order 
of 15.e, Electric-field profile £ extracted from the white dashed line ind. 

f, g, Top (f) and side (g) views of simulated electric-field distributions of the 
plasmonic whispering-galley modes with azimuthal order of 14. h, Electric-field 
profile extracted from the white dashed lineing. 
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Extended Data Fig. 8 | Comparison of our work to reported plasmonic lasers in terms of the threshold and cavity size. Blue square symbols represent 
reported plasmonic nanolasers in the literature. The red star represents our sodium-based plasmonic nanolaser. Ais the lasing emission wavelength in free space. 


Article 


a 100 
Z90 
0) 
oO 
& Quartz 
{5} 
2 
80 . wy 
~~ 
70 
400 500 600 700 800 900 
Wavelength (nm) 
c 60 C heating, 40% humidity 
750: T T T T T T 
= Intensity 
7004 
E meee 28gee 8 os eo ® = @ 8 80 
~ x 
6504 ae 
Cc = 
602 
3 5 
= TTT tLe | se e's * © o _ 
550+ 


= Wavelength 


0 5 10 15 20 25 30 35 
Time (days) 
© 120 
—110F 
= 
10) 
e 
8 100, 
[s) 
2 
o 
na 
90F 
80 
500 600 700 800 900 


Wavelength (nm) 


vo} 


After 6 months at 152 kW cm? 


Normalized Intensity 


1400 
Wavelength (nm) 


7200 1600 
Extended Data Fig. 9 | Stability test for sodium-based reflective plasmonic 
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fabricated device in terms of the resonance wavelength and intensity at room 
temperature and air humidity 70%. e, f, Stability of the sodium mirror. 

e, Reflectance spectrum of the sodium mirror against a standard silver mirror 
(Thorlabs, PF10-03-P01). f, Reflectance of the packaged sodium mirror at 
wavelengthA=750 nm over 120 days in air. g, Lasing spectrum ofa 
sodium-based plasmonic nanolaser after six months. 


Extended Data Table 1| Performance comparison of near-infrared plasmonic nanolasers 


Table 1|Performance comparison of near-infrared plasmonic nanolasers 
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The production of large single-crystal metal foils with various facet indices has long 
been a pursuit in materials science owing to their potential applications in crystal 
epitaxy, catalysis, electronics and thermal engineering’ ©. For a given metal, there are 
only three sets of low-index facets ({100}, {110} and {111}). In comparison, high-index 
facets are in principle infinite and could afford richer surface structures and 
properties. However, the controlled preparation of single-crystal foils with high-index 
facets is challenging, because they are neither thermodynamically®’ nor kinetically? 
favourable compared to low-index facets® ®. Here we report a seeded growth 
technique for building a library of single-crystal copper foils with sizes of about 

30 x 20 square centimetres and more than 30 kinds of facet. A mild pre-oxidation of 
polycrystalline copper foils, followed by annealing in a reducing atmosphere, leads to 
the growth of high-index copper facets that cover almost the entire foil and have the 
potential of growing to lengths of several metres. The creation of oxide surface layers 
on our foils means that surface energy minimization is not a key determinant of facet 


selection for growth, as is usually the case. Instead, facet selection is dictated 
randomly by the facet of the largest grain (irrespective of its surface energy), which 
consumes smaller grains and eliminates grain boundaries. Our high-index foils can be 
used as seeds for the growth of other Cu foils along either the in-plane or the 
out-of-plane direction. We show that this technique is also applicable to the growth of 
high-index single-crystal nickel foils, and we explore the possibility of using our 
high-index copper foils as substrates for the epitaxial growth of two-dimensional 
materials. Other applications are expected in selective catalysis, low-impedance 
electrical conduction and heat dissipation. 


At present, high-index metal foils are mainly obtained by cutting 
bulk single-crystal metal ingots or by epitaxial deposition on other 
non-metal single crystals with a high-index facet’””°. Thus, only a few 
index choices are available, and the cost is very high. Additionally, in 
the typical cutting and polishing method, accurate index control and 
flat surfaces with uniformly parallel step edges are very challenging to 
realize (with typically a 1°-3° tolerance in the cutting angle of single 
crystals from commercial products). Hence, new methods that enable 
efficient production of large single-crystal metal foils with various 
high-index facets are in great demand. 


Recently, the synthesis of noble metal nanocrystals with 
high-index facets by electrochemical reaction, electrodeposition 
or solution-phase synthesis has been reported**. The basic idea in 
high-index noble metal nanocrystal growthis to perturb the thermo- 
dynamic equilibrium state of alow-index facet or to modify the kinetic 
barrier to the growth of a high-index seed. In our experiment, we first 
developed a pre-oxidation treatment technique to generate a large 
grain seed with a high-index facet ina commercial polycrystalline Cu 
foil. The Cu foils were oxidized at 150-650 °C in air, typically for afew 
hours, and then annealed in a reducing atmosphere at 1,020 °C for 
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Fig. 1| Characterization of a Cu(112) single crystal annealed froma 
commercial polycrystalline foil. a, Optical image of a single-crystal Cu(112) 
after mild oxidation in air at 200 °C. The single-crystal Cu(112) shows a 
homogeneous colour (the small darker pattern at the bottom-left corner is the 
polycrystalline part). The foil size is 39 x 21cm’. The rulers shown inall figures 
indicate millimetres unless stated otherwise. b, XRD 20scan data for the 
annealed Cu foil with (224) or (112) orientation. Inset, azimuthal off-axis @ scan 
spectrum with only one peak corresponding to Cu(200), confirming 


several hours (see Methods for more details). The as-annealed Cu foil 
was heated in air at about 200 °C for 2-30 min before optical imaging. 
Owing to the difference in oxidation barriers, different types of facets 
show distinct colours, and a single-crystal Cu domain can be clearly 
observed as ahomogeneous region in an optical image”. After thermal 
annealing, the whole -39 x 21 cm? commercial polycrystalline Cu foil 
was transformed into a large single crystal (see Fig. la, where only 
the small dark orange zone at the bottom-left corner is polycrystal- 
line), which is the largest single-crystal Cu foil reported so far, to our 
knowledge. This appealing property is enabled by our large annealing 
furnace, witha heating area of length 50 cm and diameter 23 cm, which 
provides a highly stable temperature distribution. The X-ray diffraction 
(XRD) 26 scan (Fig. 1b) reveals that after pre-oxidation and subsequent 
annealing, the facet of the single crystal is not Cu(111), as reported 
before”, but Cu(224) or Cu(112) (only facets with all Miller indices 
h, kand leven or odd give a non-zero 26 scan signal in XRD owing to 
the extinction rule). A further azimuthal off-axis @ scan shows only 
one peak corresponding to Cu(200) (inset in Fig. 1b), demonstrating 
that the as-annealed Cu foil is a single crystal without in-plane rota- 
tion. To further confirm its crystal structure, single-crystal XRD was 
performed, and lattice reconstruction using a standard procedure 
confirms the (112) facet (Fig. 1c). Electron back-scattered diffraction 
(EBSD) was also applied to analyse the surface crystallography. The 
uniform violet colour in the inverse pole figure (IPF) map verifies the 
(112) facet index (top panel in Fig. 1d); the kernel average misorienta- 
tion (KAM) map (bottom panel in Fig. 1d) shows that the local aver- 
age misorientation between each measured point and its nearest 
neighbours is extremely small (typically less than 0.3°), indicating 
that the as-annealed Cu foil isa homogeneous single crystal. The (112) 
lattice structure of the as-annealed Cu (schematic in Fig. le) was also 
demonstrated in both reciprocal space and real space by low-energy 
electron diffraction (LEED; Fig. 1f) and scanning transmission electron 
microscopy (STEM; Fig. 1g), respectively. 


single-crystal features without in-plane rotation. c, Reconstructed 
single-crystal XRD pattern of the Cu(112) foil. d, Representative IPF (top) and 
KAM (bottom) maps ofa Cu(112) foil. e, Schematic diagram of the Cu(112) 
surface. The orange, green and blue balls correspond to the first, second and 
third layers of Cuatoms fromthe surface, respectively. The red dashed box 
indicates the primitive cell for the first layer of Cu(112). f, Representative LEED 
pattern of the as-prepared Cu(112) foil. g, Atomically resolved STEM image of 
the Cu(112) foil. Inset, fast Fourier transform pattern of the STEM image. 


With our design, the Cu(111) facet is no longer the only thermody- 
namically favourable facet, and a high-index facet, such as Cu(112), 
can be successfully synthesized. We repeated the procedure on many 
Cu foils, and more than 30 kinds of high-index single crystal with large 
domain sizes were prepared (length 25-39 cm, width 21cm). Eight rep- 
resentative kinds of Cu foil with different facet indices anda typical size 
of 35 x 21cm? are shown in Fig. 2a (the large uniform regions are single 
crystals). The single-crystal nature of the annealed Cu foils was further 
confirmed by the distinct colours inthe corresponding EBSD IPF maps 
(Fig. 2b). The facet indices were identified by the characteristic peaks in 
the XRD 26scan spectra (Fig. 2c) and reconstructed single-crystal XRD 
data (Extended Data Fig. 1). Some high-index surface structures were 
further revealed by atom-resolved scanning tunnelling microscopy 
(STM) images (Extended Data Fig. 2), in which the spacing between 
atomic stripes matches the theoretical values from the atomic model. 

To verify the essential role of the pre-oxidation treatment in produc- 
ing high-index facets (Fig. 3a, stage 1), we carried out acontrol experi- 
ment in which we locally oxidized one end of a piece of polycrystalline 
Cu foil while the other end remained intact (Fig. 3b). After the annealing 
procedure, the pre-oxidized part was transformed into Cu(235) and the 
untreated part was transformed into Cu(111) (Fig. 3c, Extended Data 
Fig. 3). Statistical data further revealed that only Cu(111) foils were 
obtained without pre-oxidation; when the Cu foils were pre-oxidized, 
more than 30 types of facet were observed (Fig. 3d, e, Extended Data 
Table 1). 

The above observations clearly reveal that pre-oxidation must induce 
some critical factor that controls the formation of high-index seeds. 
It is well established that the stored strain energy and the surface energy 
ina thin metal film are competitive driving forces for abnormal grain 
growth during annealing, with the former leading to abnormal Cu(001) 
grains because the smallest biaxial modulus of Cu is along the (001) 
direction’, and the latter giving the facet with the minimum surface 
energy. Before the start of the abnormal grain growth, our thin (only 
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Fig. 2| Single-crystal Cu foils of A4 paper size with various facet indices. 

a, Optical image of the eight representative types of single-crystal Cu foil with 
typical size of 35 x 21cm. A mild oxidation treatment was performed to quickly 
identify the single-crystal region in the foils (the uniform areas are single 
crystals; the coloured corners are polycrystalline). The ruler indicates 
centimetres. b, EBSD IPF maps of the eight kinds of single-crystal Cu foil. The 
distinct colours reveal their different facet indices. These maps have the same 


25 um thick) Cu foil was annealed at 1,020 °C for along time; thus, most 
of the stored strain energy was released, and the abnormal grain growth 
was mainly driven by the surface energy. As aconsequence, annealing 
of an original Cu foil results in a Cu(111) single crystal with the lowest 
surface energy among all possible facets” *. 

On the other hand, when Cu is mildly oxidized, both of its upper 
and lower surfaces are covered by a layer of Cu,0 grains with different 
orientations (Extended Data Fig. 4). Thus, the two free surfaces of a Cu 
grain are transformed into two Cu-Cu,O interfaces after pre-oxidation, 
and then the interface energy, which is the summation of the energies 
of all types of interfaces between many small polycrystalline Cu,O 
grains and the Cu foil, depends weakly on the orientation of the Cu 
grainitself. Therefore, grain seeds with various surface indices (Cu(hkl)) 
have certain probabilities of growing abnormally until the oxide layer 
is completely reduced. With further annealing, the Cu(hkd grain seed 
with an advantage in size spreads to the whole Cu foil by abnormal 
grain growth (Fig. 3b, stage 2). 

Unlike abnormal grain seeding, whichis mainly driven by thermody- 
namics, abnormal grain growth is mainly controlled by kinetics. The 
large abnormal grain consumes the surrounding small normal grains 
and eliminates the grain boundaries in the Cu foil. In sucha process, 
excessive oxygen at the grain boundaries acts as pinning centres and 
lowers the mobility of the grain boundaries”. Introducing hydrogen 
during annealing removes excessive oxygen from the Cu foil and accel- 
erates the abnormal grain growth. In addition, the designed in-plane 
temperature difference throughout the Cu foil facilitates only one 
abnormal grain acting as aseed in the high-temperature region in most 
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size.c, XRD 20scan spectra of the eight kinds of single-crystal Cu foil. The 
characteristic peaks are used to identify the facet indices (owing to the 
extinction rule in XRD, only peaks corresponding to (244), (246), (446) and 
(466) appear, respectively, for the (122), (123), (223) and (233) facets). The X-ray 
source isa silver-based target with wavelength of about 0.56 A andall the peaks 
are normalized with respect to the intensity maximum. 


annealing processes (Extended Data Fig. 5). We also notice that the sur- 
face energy of pure Cu may play some role during the abnormal grain 
growth, favouring the formation of low-energy facets, because some 
parts of the oxide layer could be reduced to Cu during the annealing in 
the abnormal grain seeding stage** *° (Fig. 3f, Extended Data Table 1). 

Using this technique, we can readily obtain large single-crystal Cu 
foils with various high-index facets. However, the appearance of acer- 
tain high-index seed is arandom event and cannot be experimentally 
designed. To controllably produce the desired high-index single-crystal 
Cu foils, we further propose a facet ‘transfer’ method to copy the facet of 
the obtained high-index single crystals. First, a small piece of high-index 
single-crystal Cu, cut froma large single-crystal Cu foil obtained by 
pre-oxidation annealing, was placed ona large polycrystalline Cu foil 
to serve as anewseed. When annealed ata high temperature (but below 
the melting point of the bulk Cu), the surface of the seed started to 
assimilate into the polycrystalline Cu, and the single-crystal lattice 
arrangement hada very high chance (>98% under our current experi- 
mental conditions) of transferring to the polycrystalline Cu, resulting 
ina ‘nucleus’ with the same facet as the seed (Extended Data Fig. 6a). 
The time evolution of the thermal annealing process verifies that the 
facet ‘transfer’ indeed works, as shown by the example Cu(245) single 
crystal. The triangular single-crystal Cu(245) piece first assimilated 
from the top surface, on which the seed was placed, to the bottom 
surface of the polycrystalline Cu foil (Fig. 4a, b), and a new Cu(245) 
seed was formed inthe polycrystalline foil that then spread into alarge 
single crystal (Fig. 4c, d). The identical colour of the EBSD IPF maps 
(Fig. 4e), together with the XRD 26 and scan data for the initial seed 
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Fig. 3 | Pre-oxidation-guided seeded growth of single-crystal Cu foils with 
high facet indices. a, Schematic diagrams of the two stages of the fabrication 
of high-index single crystals. Stage 1: without an oxide layer, Cu(111) has the 
lowest surface energy andis the prevailing growing facet. Once the oxide layer 
is formed, high-index Cu(hk/) can grow, and form the abnormal grain seed. 
Stage 2: the large abnormal grain seed consumes the surrounding small normal 
grains and eliminates the grain boundaries in the Cu foil. b, c, Optical images of 
a Cu foil before (b) and after (c) annealing. One terminal of the Cu foil was 
pre-oxidized locally and the other one was kept intact before annealing. After 
annealing, the pre-oxidized terminal transformed into Cu(235) and the intact 


and the final obtained Cu (Extended Data Fig. 6b-f), showed that the 
facet of the seed was successfully copied and a new large-size Cu(245) 
single crystal was produced. In addition to this in-plane facet transfer, 
we also found that a Cu foil can be transformed into ‘bulk’ single-crystal 
Cu by an out-of-plane facet transfer process, as shown in Fig. 4f. Here, 
the length of each foil was deliberately increased by about 2mm with 
respect tothe previous one to facilitate characterization of the obtained 
single crystal. The identical colour of the EBSD IPF maps for the differ- 
ent layers of Cu foils after cyclic facet ‘transfer’, together with the XRD 
characterization for the whole thick Cu foil, demonstrated that the 
single-crystal structure of the first Cu(256) layer was successfully trans- 
ferred throughout the foil (Fig. 4g—i, Extended Data Fig. 6g), revealing 
anew facile way to prepare a bulk single-crystal Cu slab with controlled 
crystal orientation. The above processes of vertical assimilation and 
in-plane growth were explained and reproduced by our molecular 
dynamics simulations (Extended Data Fig. 7). 

To demonstrate the universality of this seeded abnormal grain growth 
technique, we attempted to anneal nickel (Ni) using a similar approach 
at about 1,200 °C. Three types of high-index single-crystal Ni foil with 
(012), (013) and (355) facets, typically with a size of about 5 x 5 cm’, were 
obtained (Fig. 4j-n, Extended Data Fig. 8). The Ni single-crystal size was 
limited by the diameter of the alundum tubes in our custom-designed 
high-temperature chemical vapour deposition system. In addition to 
Cuand Ni, we expect that similar approaches could be applied to grow 


one changed into Cu(111). d, e, IPFs in the normal direction for the distribution 
of facet indices of Cu single crystals annealed without (d) and with (e) 
pre-oxidation. 35 kinds of Cu facet appeared with the pre-oxidation treatment, 
in striking contrast to the occurrence of only the Cu(111) facet without 
pre-oxidation. f, Appearance frequency of facets as a function of their surface 
energy for the single-crystal Cu foils obtained. The general trend shows that 
facets with lower surface energies can be achieved more easily. All the samples 
were pre-oxidized under the same conditions (150 °C for 2h) and had almost 
the same oxide thickness (~50 nm). 


single-crystal foils of many other metals (suchas Fe, Al, Ag and Pt) and 
their alloys. Moreover, we found that the epitaxial growth of graphene 
and hexagonal boron nitride (hBN) is also achievable on high-index 
facets (Extended Data Fig. 9), not only on Cu(111) (refs. ”””) or vicinal 
Cu(110) (ref. 8), as previously reported. We therefore expect further 
advancement in the fundamental exploration and technical applica- 
tions of these large high-index single-crystal metal foils. 
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Methods 


Seeded growth of single-crystal high-index Cu foils 

Pre-oxidation of commercial Cu foils. A polycrystalline Cu foil (25 um 
thick, 99.8%, Sichuan Oriental Stars Trading Co. Ltd, #Cu-1031) was 
placed ona quartz substrate and loaded into a chemical vapour depo- 
sition (CVD) furnace (Tianjin Kaiheng Co. Ltd, custom designed with 
a heating area of length 50 cm and diameter 23 cm). The furnace was 
slowly heated to 150-650 °C in 10-60 minand then maintained at this 
temperature for typically 1-4 h to oxidize the Cu surface. 


Annealing of commercial Cu foils. The pre-oxidized Cu foil was placed 
on a quartz substrate and loaded into an atmospheric-pressure CVD 
system. The system was heated to 1,020 °C in1 hand maintained at this 
temperature under a gas flow of 800 standard cubic centimetres per 
minute (sccm) Ar and 50 sccm H, for 3-10 h, and was then naturally 
cooled to room temperature in an Ar and H, atmosphere. The H, is 
necessary in the annealing process and the large single-crystal growth 
cannot occur ina pure Ar atmosphere. 


Seeded growth of single-crystal high-index Ni foils 

The Ni foils (100 um thick, 99.994%, Alfa Aesar) were first oxidized in air 
at 150-650 °C for 1-4 hand then annealed in a reducing atmosphere at 
1,200 °C for 3-6 h. After thermal annealing, single-crystal Ni foil of size 
~5x5cm?’was successfully obtained. By repeating the typical annealing 
procedure, several high-index single-crystal Ni foils can be produced. 


Measurement of the in-plane temperature distribution of our 
furnace 

The temperature distribution was first simulated, as shown in Extended 
Data Fig. 5a. The maximum temperature difference was measured by a 
melting experiment using Ag foils. Nine pieces of Ag foil were placed 
into our furnace at different positions and were then annealed at 955 °C, 
960 °C and 965 °C. With increasing temperature, different parts melted 
(Extended Data Fig. 5b-d), and the maximum temperature difference 
was estimated to be -10 °C. 


Growth of graphene and hBN 

Growth of graphene. The high-index single-crystal Cu foil was placed 
ona quartz substrate and loaded into the CVD chamber. After heating 
to 1,010 °C under a reducing atmosphere (500 sccm Ar, 10 sccm H,), 
CH, (0.5 sccm) was introduced into the system for 10-30 min. Then, 
the system was naturally cooled under the same reducing atmosphere. 
The growth was carried out under atmospheric pressure. 


Growth of hBN. The as-grown single-crystal Cu foil was pre-heated 
and re-annealed at 1,000 °C for 30 min witha gas mixture of Ar and H, 
(500 sccm Ar, 10 sccm H,) under atmospheric pressure. A precursor of 
ammonia borane was nested in a quartz crucible and heated at 65 °C for 
sublimation, and was then carried by Ar and H, (250 sccm Ar, 250 sccm 
H,). To visualize the individual hBN domains, the growth time was set to 
be 1h. The growth was carried out under a pressure of ~200 Pa. 


Characterization 

The EBSD characterizations were carried out using a PHI 710 Scanning 
Auger Nanoprobe instrument. LEED measurements were performed 
using an OmicronLEED system in ultrahigh vacuum with base pressure 
<3 x10” Pa. STEM experiments were performed ina FEI Titan Themis 
G2 300 system operated at 300 kV. SEM images were obtained using 
a FEI Nova NanoSEM 430 scanning electron microscope. 

XRD 26 scan measurements were conducted using a Bruker D8 
Advance system with a silver target with X-ray wavelength (~0.56 A) 
shorter than that of the copper target (~1.54 A) and more suitable for 
high-index facet characterization with small facet spacing. XRD @scan 
measurements were conducted using a PANalytical X' Pert Pro system 


witha copper target. Single-crystal XRD measurements were conducted 
using a Bruker D8 Venture system. The primitive cell of the sample was 
determined by single-crystal XRD, after which the facet index of the 
Cu foil was obtained from the reconstructed facet that was parallel 
to the foil surface. 

STM experiments were performed with a Createc system (Germany) 
under ultrahigh vacuum (-1x10~ torr). To remove the contamination, 
the Cu foils were cleaned by Ar* sputtering at 1 keV and annealing at 700 K 
for 2-5 cycles. Then, it was transferred into the liquid-nitrogen-cooled 
(77 K) scanner for high-resolution imaging. All the STM images were 
taken at the set point of 100 mV, 50 pA. Throughout the experiments, 
the bias voltage refers to the sample voltage with respect to the 
tungsten tip. 


Calculation of Cu surface energy 

The surface energies of various Cu facets were obtained by density 
functional theory (DFT) calculations as implemented in the Vienna ab 
initio simulation package. The exchange-correlation effect was treated 
using the Perdew-Burke-Ernzerhof generalized gradient approxima- 
tion. The projected augmented wave method was used to describe the 
interaction between valence electrons and the ionic cores. To obtain 
the surface energy of a certain Cu facet, a thick Cu slab consisting of 
>6 atomic layers with this facet index was constructed. In addition, a 
thin Cu slab, which was similar to the thick one but with only half of its 
atomic layers, was also fabricated. After that, both the thick and the thin 
Cuslabs were fully optimized via the conjugated gradient method, until 
the force on each atom was less than 0.01 eV A“. During optimization, 
a vacuum slab of at least 10 A was used to avoid the periodic imaging 
interaction. A separation of 0.03 A ‘was used for the K-point mesh sam- 
pling. The energy converged to 10“ eV and anenergy cut-off of 400 eV 
for the plane-wave basis was adopted during structure optimization. The 
surface energy (F) of one Cu facet can be calculated as F = (Eypic, — 2Ecnin)/ 
(2A), where Fy; and E,,;, are the energy of the thick and thin Cu slabs, 
respectively, and A is the area of one of the two surfaces of the slab. 


Calculation on grain boundary energies 

We calculated the grain boundary energies to understand the abnormal 
grain growthin a semiquantitative way. During the annealing process, 
asmall grain near an abnormal grain will be transformed into a part of 
the abnormal grain via the propagation of the grain boundary between 
them (Extended Data Fig. 3d). The energy change during sucha grain 
growth process can be estimated as AF=25,(y,- yp) — (AL) AYap, Where 
25S, ~ 2L’ is the surface area of grain B (considering both sides of the 
metal foil); L is the lateral size of grain B; al is the estimated length 
of the grain boundary between them, where a = 1-4 depends on the 
shape of grain B (the value 4 represents a single square grain inside 
the abnormal grain); his the thickness of the metal foil; y,, yz and Yap 
are surface energies of grain A and grain B and the grain boundary 
energy between grains A and B, respectively. For athermodynamically 
favourable annealing process, the energy change must satisfy AF = 
L?[2(¥,- Vp) — (AYaph/L)]< 0. Obviously, ify, - y,< 0, this requirement will 
be always satisfied. However, if y, — y, > O, we will have A/L > 2(y, - yp)/ 
(@Y,,). For our experiment /= 25 pm, and our DFT calculations using a 
model ofa metal slab in vacuum showed that yp ~ Ycurio) ¥ 9-01 eVA?and 
Yap~ 0.04 eVA?. Therefore, we have Z ~150 um when y, isthe maximum 
surface energy and a= 3. Considering that the surfaces of the Cu foil 
must be passivated by the hydrogen during the annealing process, the 
surface energy difference Ys — Ycucio) Should be much smaller than our 
calculated value, which is based ona model in vacuum, so we expect that 
the annealing of Cu foil with much large normal grains (for example, 
with L > 200 pm) is experimentally feasible. 


Molecular dynamics simulation 
The calculations were performed using the software package LAMMPS 
(large-scale atomic/molecular massively parallel simulator). The Cu-Cu 
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interaction was modelled with the embedded-atom method. For seeded 
growth case (69,695 atoms in total), aseed (6 x 6 x 1.5 nm’) with in-plane 
orientation (345) was initially placed in the middle of the polycrystalline 
substrate (16 x 16 x 3 nm’), which was divided into 8 x 8 sectors with 
distinct orientations. The system was heated from 150 K to 1,150 K at 
arate of 1K ps“, and then annealed at 1,150 K for 20 ns. Simulations 
were carried out using the NVT ensemble (constant number of mol- 
ecules, volume and temperature), where a Nosé-Hoover thermostat 
was applied to fix the temperature. The velocity Verlet algorithm was 
used with an integration time step of 1.0 fs. Periodical boundary condi- 
tions were applied along the x and y directions. 


Data availability 


All related data generated and/or analysed during the current study 
are available from the corresponding author on reasonable request. 
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Extended Data Fig. 1| Reconstructed single-crystal XRD images of Cu foils with various indices. Corresponding to those shown in Fig. 2. The white facets are 
parallel to the Cu foil surfaces. 
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‘stripe’ spacings are consistent with calculated values from the atomic model. 
Because the corrugation along the Cu stripe is much smaller than that 
perpendicular to the stripe, owing to the strong delocalization of electrons 
along the stripe, the atomic resolution of Cu atoms within the stripe is not very 


high. 


Extended Data Fig. 2| Atomic surface structures of Cu surfaces with 
high-index facets. a—d, Atomic structures of (112) (a), (122) (b), (133) (c) and 
(223) (d) surfaces (top, side view; bottom, top view). e-h, Corresponding STM 
images of the single-crystal Cu surfaces shown in a-d. These maps are of the 
same size. The high-index facets feature a stripe-like structure. The measured 
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(001) 


(100) 
Cu(111) Cu(235) 
Extended Data Fig. 3 |Representative EBSD IPF maps and reconstructed was transformed into Cu(235) (b). The two maps are of the same size. c, 
single-crystal XRD image of Cu foil and the calculation modelforboundary — Reconstructed single-crystal XRD image of the Cu(235) facet. d, Growth of a 
movement during abnormal grain growth. a, b, After annealing, the large grain by consuming small grains around it. 
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Extended Data Fig. 4| Characterizations of the oxide layer onthe Cu 
surface and statistics of high-index facets obtained. a—c, SEM images of Cu 
cross section after oxidation at 150 °C for 2h (a), 250 °C for 2h (b) and 500 °C 
for 1h(c). The samples were prepared using the focused ion beam technique, 
and the polymethyl! methacrylate (PMMA) and platinum (Pt) were used as 
protective layers to prevent damage to the sample from milling with the ion 
beams. d, Cross-sectional TEM images of Cu oxidized in air at 500 °C for1h. An 
oxidized layer was formed onthe Cu surface. e, f, Selected-area electron 


high-index facets 


high-index facets 


diffraction patterns of the Cu (e) and the oxide layer (f). Unlike the 
single-crystal texture of the original Cu (e), the oxide layer has a polycrystalline 
structure (f). g-i, Statistical sector diagrams of the facet indices obtained by 
annealing the pre-oxidized Cu foils at conditions corresponding to a-c. The 
occurrence probability of high-index Cu facets was reduced from ~82% (g; 

150 °C in air for 2 h) to -74% (h; 250 °C in air for 2h) and ~67% (1; 500 °C in air for 
1h), but remained very high. 


Extended Data Fig. 5 | Temperature-difference-driven single-seed 
abnormal grain growth. a, Simulated temperature distribution in the Cu foil. 
The upper and lower areas of the central part of the Cu foil have the highest 
temperature. b-d, Measurement of the temperature difference. Nine pieces of 
Ag foil were placed into our furnace and were then annealed at 955 °C (b), 


960 °C (c) and 965 °C (d). With increasing temperature, different parts melted; 
the temperature difference was estimated to be -10 °C. The image sizes for b-d 
are -40 x 22 cm’. e-h, Typical single-crystal evolutionin our abnormal grain 
growth withatemperature difference. 
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range. f, Reconstructed single-crystal XRD image of the (245) facet, showing 
that the (245) seed was successfully copied and anew large-size single-crystal 
Cu(245) foil was produced. g, Reconstructed single-crystal XRD image of the 


Cu(256) foil. 


Extended Data Fig. 6| Facet transfer and representative XRD spectra of the 
seed and the Cu foils with transferred facets. a, In-plane facet transfer by 
placing asmall single-crystal Cu(hkl) piece on polycrystalline Cu foils. 

b-e, XRD 26 (b) and g (d) scan data for the seed, and the final Cu foil with the 
transferred facet (c, e). The 26 peak for the (245) facet is out of the XRD scan 
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Extended Data Fig. 7 | Simulation of facet-transfer growth. a, Simulation 
system. b-g, Structure evolution during the heating and preservation process. 
Two cross-sections are presented here, marked by dark dashed linesina.b 
shows the initial configuration of the system, where the substrate is composed 
of 64 parts with diverse in-plane orientations. The seed is placed in the middle 
of the substrate surface. With increasing temperature, the part of the substrate 
that is in contact with the seed inherits the seed orientation first; then, the 


seeded grain has an advantage in size over the other grains and the orientation 
spreads into the entire foil (c-g). Different colours in b-g indicate distinct 
structures, that is, the red parts havea face-centred cubic (fcc) structure; the 
white parts are grain boundaries; the green parts are stacking faults, twin 
boundaries or grain boundaries; and the blue parts have a body-centred cubic 
(bec) structure. 
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Extended Data Fig. 8| Crystallographic characterization of high-index single-crystal Ni foils. a,b, SEMimages of polycrystalline (a) and single-crystal (b) Ni 
foils with the same size. c—e, Reconstructed single-crystal XRD images of single-crystal Ni foils. The white facets are parallel to the Ni foil surfaces. 
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Extended Data Fig. 9 | Epitaxial growth of graphene and hBN on high-index of unidirectionally aligned hBN domains on (013) (e), (014) (f), (025) (g) and 
single-crystal Cu foils. a-d, SEM images of unidirectionally aligned graphene (122) (h) facets. 
(Gr) domains on (112) (a), (113) (b), (133) (c) and (223) (d) facets. e-h, SEM images 
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Extended Data Table 1| Surface energy (in eV nm”) and facet characterization of single-crystal Cu foils 


Facet index 011 012 013 014 023 025 034 111 112 143 115 122 
Surface energy 9.90 9.91 9.92 9.79 9.86 9.95 9.74 8.10 9.40 9.60 9.30 9.20 
20-scan XRD V V V V V V V V V V V V 
Single-crystal XRD V V V V V V V V V V V V 
Facet index 123 133 159 223 233 234 235 245 255 256 335 337 
Surface energy 9.80 9.40 10.11 9.00 8.90 9.30 9.60 9.50 9.00 9.40 8.77 9.54 
20-scan XRD V V V V V V wie ie os oe V V 
Single-crystal XRD V V V V V V V V V V V V 
Facet index 344 345 346 355 356 357 359 377 455 456 577 
Surface energy 8.70 9.20 9.40 8.90 9.30 9.62 9.72 9.30 8.70 8.90 8.60 
20-scan XRD --- _ --- V oom V V V oo --- V 
Single-crystal XRD V V V V V V V V V V V 


‘v' indicates that the facet index has been determined by this method, and ‘—’ means that the corresponding 20 peak for the facet is out of the XRD scan range. 
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Overall water splitting, evolving hydrogen and oxygen in a 2:1 stoichiometric ratio, 
using particulate photocatalysts is a potential means of achieving scalable and 
economically viable solar hydrogen production. To obtain high solar energy 


conversion efficiency, the quantum efficiency of the photocatalytic reaction must be 
increased over a wide range of wavelengths and semiconductors with narrow 
bandgaps need to be designed. However, the quantum efficiency associated with 
overall water splitting using existing photocatalysts is typically lower than ten per 
cent’. Thus, whether a particulate photocatalyst can enable a quantum efficiency of 
100 per cent for the greatly endergonic water-splitting reaction remains an open 
question. Here we demonstrate overall water splitting at an external quantum 
efficiency of up to 96 per cent at wavelengths between 350 and 360 nanometres, 
which is equivalent to an internal quantum efficiency of almost unity, using a modified 
aluminium-doped strontium titanate (SrTiO,;:Al) photocatalyst*’. By selectively 
photodepositing the cocatalysts Rh/Cr,O; (ref.°) and COOOH (refs. **) for the 
hydrogen and oxygen evolution reactions, respectively, on different crystal facets of 
the semiconductor particles using anisotropic charge transport, the hydrogen and 
oxygen evolution reactions could be promoted separately. This enabled multiple 
consecutive forward charge transfers without backward charge transfer, reaching the 
upper limit of quantum efficiency for overall water splitting. Our work demonstrates 
the feasibility of overall water splitting free from charge recombination losses and 
introduces an ideal cocatalyst/photocatalyst structure for efficient water splitting. 


Demonstrating photocatalytic overall water splitting at an internal 
quantum efficiency (IQE) of 100% is an important challenge in the 
study of photocatalysis. Overall water splitting is a greatly uphill (Gibbs 
energy of +237 kJ mol”) reaction consisting of multiple electron transfer 
processes. To achieve a 100% IQE, the first requirement is that all pho- 
toexcited carriers must migrate to surface reaction sites before bulk 
recombination. In addition, two-electron injection for the hydrogen 
evolution reaction (HER) and four-hole injection for the oxygen evolu- 
tion reaction (OER) must proceed consecutively without any backward 
charge transfer. However, because there are many opportunities for 
backward electron transfer, overall water splitting with an external 
quantum efficiency (EQE) greater than 50% has rarely been demon- 
strated, even when using ultraviolet-responsive photocatalysts*’” ”°. 
Thus, it is important to establish whether a 100% IQE can be realized, 
by completely inhibiting backward electron transfer, and conclusively 
determine an effective photocatalyst structure. 

SrTiO, is a suitable photocatalytic material for the assessment of 
this possibility. This compound is a well characterized photocatalyst 
with a bandgap energy of 3.2 eV (ref. ""“), and its EQE for overall water 
splitting has been improved by up to 69% over the past years using 


various refinements**”*, Here, we increased the EQE to its upper limit 
by constructing highly active HER and OER cocatalysts on SrTiO;:Al 
particles site-selectively. This was accomplished by using a stepwise 
photodeposition method instead of an impregnation process, which 
results in random dispersion of the cocatalysts. 

The water-splitting activity of SrTiO,:Al loaded with Rh, Cr and 
Co species as cocatalysts via either photodeposition or conven- 
tional impregnation methods is shown in Fig. la. The photocatalyst 
loaded with Rh (0.1 wt%) and subsequently with Cr,0; (0.05 wt%) 
via two-step photodeposition® evolved H, and O, at the expected 
stoichiometric ratio for water splitting (Fig. 1a, left). The additional 
photodeposition of COOOH (0.05 wt%)** onto this sample further 
increased the water-splitting activity (Fig. 1a, middle). These photo- 
catalysts, modified through sequential photodeposition, split water 
approximately 2 and 2.5 times faster than the sample loaded with 
a Rh-Cr oxide (0.1 wt% of each metal) by coimpregnation? (Fig. 1a, 
right). The highest water-splitting activity was obtained with Cr and 
Co loadings of 0.05 wt% each (Extended Data Fig. la, b). The activ- 
ity was enhanced reproducibly and was maintained at 94% at least 
for 12.5 h (Extended Data Fig. 2). Figure 1b shows the wavelength 


'Research Initiative for Supra-Materials, Shinshu University, Nagano, Japan. 7Graduate School of Science and Technology for Innovation, Yamaguchi University, Ube, Japan. “Institute of 
Engineering Innovation, School of Engineering, The University of Tokyo, Tokyo, Japan. “Nanomaterials Research Institute, National Institute of Advanced Industrial Science and Technology, 
Tsukuba, Japan. °Office of University Professors, The University of Tokyo, Tokyo, Japan. “e-mail: domen@shinshu-u.ac.jp 


Nature | Vol581 | 28 May 2020 | 411 


Article 


a 10 as T T T T T T T 


Amount of product (mmol) 


b T T T T 
100 100 
80 80 
gS 
= 60 + 60 8 
WW a 
a e 
40 40 8 
~? 
20 20 
0 N f | 1 | 1 ! 0 


360 380 
Wavelength (nm) 


Fig. 1| Photocatalytic water-splitting activities. a, Time course of H,andO, 
evolution on SrTiO,:Al loaded with various cocatalysts during 
photoirradiation. Left, loaded with Rh (0.1 wt%)/Cr,0; (0.05 wt%) by two-step 
photodeposition. Middle, loaded with Rh (0.1 wt%)/Cr,0; (0.05 wt%)/COOOH 
(0.05 wt%) by three-step photodeposition. Right, loaded with Rh (0.1 wt%)-Cr 
(0.1 wt%) oxide by co-impregnation. Solid lines are guides for the eye. 

b, Ultraviolet—visible diffuse reflectance spectrum of bare SrTiO;:Al (black 
solid line) and wavelength dependence of external quantum efficiency (EQE) 
during water splitting on Rh (0.1 wt%)/Cr,0, (0.05 wt%)/COOOH 

(0.05 wt%)-loaded SrTiO;:Al (red symbols). 


dependence of the EQE during overall water splitting using the most 
active sample (Fig. la, middle) and the ultraviolet—visible diffuse 
reflectance spectrum of unmodified SrTiO,:Al. The EQE values at 
350 nm, 360 nm and 365 nm were determined to be 95.7%, 95.9% and 
91.6%, respectively; to our knowledge, these are the highest values 
reported so far for a water-splitting photocatalyst. The EQE values at 
370 nm and 380 nm were decreased to 59.7% and 33.6%, respectively, 
as aresult of decreased light absorption and probably the lower IQE 
at these wavelengths. The IQE obtained on the basis of the number of 
absorbed photons should be close to 100% in the wavelength region 
of 350-360 nm considering that the EQE exceeded 95%, although it 
is difficult to precisely determine the number of photons absorbed 
by the photocatalyst owing to scattering of incident photons towards 
the exterior of the reactor (see Extended Data Table 1 for details). 
The solar-to-hydrogen (STH) efficiency under simulated sunlight 
illumination was 0.65% (Extended Data Fig. 3). 

During the photodeposition process, photoexcited electrons and 
holes migrated to the surfaces of semiconductor particles and reduced 
or oxidized metal ions, respectively, to form deposited nanoparticles. 
The resulting nanoparticles acted as cocatalysts. In this case, a Rh core/ 
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Fig. 2|Location of cocatalysts. a—d, SEM images of SrTiO,:Al loaded with 
various cocatalysts. Unloaded (a) and loaded with Rh (0.1 wt%) (b), Rh (O.1wt%)/Cr 
(0.05 wt%) (c) and Rh (0.1 wt%)/Cr,0, (0.05 wt%)/CoOOH (0.05 wt%) (d). 

e, STEM-EDS elemental mappings of SrTiO,:Al loaded with Rh (0.1 wt%)/Cr,0, 
(0.05 wt%)/COOOH (0.05 wt%). EDS data were obtained from the area inthe red 
dashed box. 


Cr,0; shell structure was formed during atwo-step photodeposition, 
through the reduction of Rh* to Rh° and a subsequent conversion of 
Cr(vI)O,” to Cr(1I1),0, using photoexcited electrons>. We note that a 
shell made of Cr species is expected to be hydrated in water to form 
Cr(I1)O;5-m(OH)2'XH,O, but herein it is denoted as Cr,O; for brevity. 
The Rh promotes both the HER and the oxygen reduction reaction 
(ORR; a major backward electron transfer process), whereas the Cr,0, 
shell inhibits only the ORR by blocking the access of evolved O, to the 
surface of the Rhcore’. Therefore, the water-splitting rate was almost 
independent of the gas phase pressures (Extended Data Fig. 4). The 
Co” ions added asa precursor were oxidatively photodeposited as the 
oxyhydroxide Co(III)OOH (ref. °), which promotes the OER®. Loading 
none of these species or only one of the Rh, Cr, Co, Rh/Co or Cr/Co 
components resulted in lower or negligible photocatalytic activity 
(Extended Data Fig. 1c). 

The microstructure of the photocatalysts modified with cocata- 
lysts via photodeposition was investigated by electron microscopy. 
Figure 2a—d presents scanning electron microscopy (SEM) images of the 
samples at eachstep of the cocatalyst photodeposition. The SrTiO,:Al 
particles were not completely cubic and various non-equivalent facets 
were exposed. Rh particles were deposited on specific crystal facets, 


Fig. 3 | Transmission electron microscopy. a, b, Selected-area electron 
diffraction pattern obtained from SrTiO,:Al loaded with Rh (0.1 wt%)/Cr,0,; 
(0.05 wt%)/CoOOH (0.05 wt%) (a) and corresponding transmission electron 
microscopy image of a particle (b). c, Particle morphology and crystal 
orientation. 


and subsequent Cr,O, deposition did not change the distribution of 
the cocatalyst particles, indicating the formation of Cr,03 shells onRh 
cores. Following the photodeposition of COOOH, CoOOH nanoparticles 
were observed on other crystal facets, separate from the Rh/Cr,O, nano- 
particles. This was also confirmed by scanning transmission electron 
microscopy and energy-dispersive X-ray spectroscopy (STEM-EDS) 
analyses (Fig. 2e). Figure 3a shows the selected-area electron diffrac- 
tion pattern obtained from the single-crystalline SrTiO,:Al particle 
shown in Fig. 3b. The Rh/Cr,O, cocatalyst was found to be preferen- 
tially deposited on the {100} crystal facets. The facets on which the 
CoOOH cocatalyst was deposited were not clearly defined because of 
the exposure of curved facets, but appeared to be primarily located in 
the (110) direction, as illustrated in Fig. 3c. Similar phenomena have 
been observed for other photocatalyst materials with anisotropic crys- 
tal habits exposing different facets’* *. These observations, together 
with the over 90% EQE values, indicate that photoexcited electrons and 
holes migrated to separate crystal facets, so the HER and OER subse- 
quently occurred on these separate facets. 

The observed anisotropic deposition of cocatalysts can be attributed 
toacharge rectification effect inside each photocatalyst particle that 
is induced by an internal electric field. This field, in turn, originates 
from the work function difference between the respective facets, just 
as in the p-njunction of a solar cell using the Fermi level difference. 
The effects of work function differences between HER and OER facets 
on anisotropic charge separation were simulated using a simplified 
two-dimensional model. The electronic energy gradient and charge 
distribution ina semiconductor particle with a work function difference 
of 0.2 eV between the {110} and {100} surfaces are plotted in Fig. 4a—e. 
Increasing the work function difference obviously increases the extent 
of anisotropic charge separation, as well as the concentration of elec- 
trons at the {100} facets and of holes at the {110} facets, as seen from 
Fig. 4f. A work function difference of 0.2 eV is sufficient for anisotropic 
charge separation (see Extended Data Fig. 5 for details). Although itis 
not possible at present to experimentally observe the electric field in 
a semiconductor particle under actual working conditions, surface 
dipoles resulting from unbalanced cation/anion ratios on distinct facets 
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Fig. 4| Simulations of photocarrier distributions in SrTiO,:Al particles. 
a-e, Mapping of conduction-band energy, F, (a); density of electrons (e), n (b); 
density of holes (h’), p (c); energy band diagram (d); and electronand hole 


densities (e) as functions of position (x’, y’) with work function difference 
AW,,= 0.2 eV. f, Effect of AW,,on electron-to-hole-density ratio at the {100} and 
{110} facets. 
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may induce this work function difference even without forming junc- 
tions or composites. 

The Rh/Cr,0, cocatalyst was reductively photodeposited on 
electron-attracting {100} facets, and subsequently mediated electron 
transfer from the semiconductor to protons. Similarly, the COOOH 
cocatalyst was oxidatively photodeposited on the hole-collecting {110} 
facets and neighbouring facets, and mediated hole transfer. These 
phenomena were central to obtaining consecutive charge transfers 
with minimum charge recombination for the studied cocatalyst/ 
photocatalyst system. The concept of separating reduction and oxi- 
dation sites by facet engineering has been reported repeatedly’*~*. 
However, the EQE of water splitting achieved was below 1%, and charge 
recombination loss was dominant. Therefore, the validity of this con- 
cept has remained controversial. This study demonstrates overall water 
splitting with an IQE close to unity by applying aluminium doping for 
defect suppression*™, flux treatment to improve the crystallinity*“, 
and aCr,O, shell to inhibit ORR’, in addition to facet engineering, 
and thus gives a definitive answer to this problem. The selection of 
high-performance cocatalysts for the HER and OER is another impor- 
tant aspect, because prompt HER and OER suppress accumulation of 
photoexcited electrons and holes and the resulting recombination. 
This work therefore reveals a photocatalyst design that enables almost 
complete utilization of photoexcited electrons and holes. 

Inlight-dependent reactions involved in photosynthesis, almost all 
the absorbed photons can be used to drive chemical reactions based 
on the functions of complex protein structures that enable prompt, 
one-way multistep electron transfers. At present, it is not possible to 
reproduce such efficient but complicated photosynthesis systems 
artificially. However, the particulate semiconductor system developed 
in this study can utilize photons at comparable quantum efficiencies 
during water splitting despite its simple structure. Recently, Ta,;N; and 
Y,Ti,O,S, have been reported to split water into hydrogen and oxygen 
under visible light?>*. These materials absorb visible light with wave- 
lengths of up to 600 nm and 640 nm, respectively, and the STH effi- 
ciency can reach 10% once the EQE is improved to a level similar to that 
of SrTiO;;:Al (refs. *°). The suitable photocatalyst design presented here 
should provide impetus to the development of particulate semiconduc- 
tor photocatalysts for practical solar hydrogen production from water. 
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Methods 


Synthesis of Al-doped SrTiO, 

SrTiO, doped with Al (SrTiO,:Al) was synthesized using molten-salt 
mediation, according to a previously reported method’. Briefly, SrCl, 
(Kanto Chemical Co.), Al,O; (Merck) and SrTiO, (Wako Pure Chemical 
Industries) were mixed by grinding in an agate mortar in a10:0.02:1 
molar ratio. The mixture was subsequently heated in an alumina cruci- 
ble at 1,423 K for 10 hin air and then allowed to cool to room tempera- 
ture. The product was then separated from residual SrCl, and Al,O, 
and related compounds by washing with distilled water three times, 
at which point the supernatant solution was neutral. The amount of Al 
incorporated inthe SrTiO, via this process was slightly lower than one 
atom per cent with respect to Ti (ref. 3). 


Cocatalyst loading 

Photodeposition was performed in situ by first dispersing SrTiO,:Al 
(0.1g) in100 ml of distilled water in the same vessel employed for the 
photocatalytic reactions, using a brief sonication. Aqueous solutions of 
RhCl,-6H,O (Wako Pure Chemical Industries), K,CrO, (Kanto Chemical 
Co.) and Co(NO,),"6H,O (Kanto Chemical Co.) were prepared. We note 
that the aqueous solution of RhCl,-6H,O should be prepared freshly 
before use. The concentration of the metal species (Rh, Cr and Co) was 
2mg mI. Calculated amounts of these aqueous solutions were added 
into the reaction solution for successive photodeposition. To deposit 
0.1 wt% Rh, 0.05 wt% Cr and 0.05 wt% Co on 0.1g of SrTiO;;:Al, the RhCl, 
aqueous solution (50 pl) was first added to the reaction suspension 
with magnetic stirring, and the resulting mixture was irradiated with 
a Xe lamp (300 W, full arc; details in Extended Data Fig. 6) for 10 min. 
Subsequently, the K,CrO, aqueous solution (25 pl) was added to the sus- 
pension with additional irradiation for 5 min, followed by the addition of 
aspecific quantity of Co(NO,), solution and another 5-min irradiation. 
We note that photodeposition of these components was performed in 
a sequence without exchanging the solvent. A cocatalyst based ona 
mixture of Rh-Cr oxides was also deposited onthe main catalyst using 
a previously reported impregnation method’. In this process, SrTiO;:Al 
was dispersed in a small amount of distilled water containing specific 
amounts of Na,RhCl, (Mitsuwa Chemicals Co.) and Cr(NO,), (Kanto 
Chemical Co.), after which the mixture was heated ona hot water bath 
until dry, followed by heating in air for 1h at 623 K. Regardless of the 
chemical state of the deposited cocatalyst, the loading amount of each 
component was calculated assuming a metallic state for simplicity. 


Characterization 

The crystal structure of the SrTiO,:Al was evaluated by X-ray diffrac- 
tion (XRD; Cu Ka radiation, Miniflex 300, Rigaku Co.; see Extended 
Data Fig. 7). The optical absorption spectrum of the SrTiO,:Al (Fig. 1b) 
was determined using a ultraviolet-visible spectrometer equipped 
with an integrating sphere (V-670, JASCO), employing a spectralon 
block as a reference to adjust the 100% reflectance level and set 
the reflectance of the sample holder to 0%. Absorptance was given by 
1-reflectance. The microstructure of the photodeposited samples was 
analysed by field-emission SEM (SU-8020, Hitachi High-Technologies 
Co.) and by field-emission (scanning) transmission electron microscopy 
(JEM-2800F, JEOL) in conjunction with EDS using an X-MAX 1OOTLE 
SDD detector (Oxford Instruments). Selected-area electron diffraction 
patterns were indexed using a ReciPro diffraction simulator. 


Photocatalytic reactions 

Photocatalytic reactions were carried outinanoverhead-irradiation-type 
glass vessel connected to a closed gas circulation system. Prior to each 
reaction, all air was evacuated from the reaction system and filled with 
Ar (about 1kPa unless otherwise noted). The suspension was subse- 
quently irradiated using a Xe lamp (300 W, full arc). Evolved gases 
accumulated in the closed gas circulation system were analysed by 


gas chromatography (GC-8A, Shimadzu Co., thermal conductivity 
detector, Ar carrier gas, molecular sieve 5 Acolumn). The STH efficiency 
was measured under simulated sunlight irradiation (AM1.5G, 9 cm? 
illuminated area, solar simulator HAL-320, Asahi Spectra Co.). The 
STH efficiency was determined according to the following equation 


hy x AG, 
STH(%) = ae x100 (1) 


Here, fy,, AGy,o,/and S represent the H, evolution rate, the reaction 
Gibbs energy of water splitting, the light energy flux and the irradiation 
area, respectively. 


Measurement of quantum efficiencies 
The EQE values were determined according to the following equation 


N(H)) 


EOE) ==> Wenhiatons) 


(2) 


where M(H.) and M(photons) denote the number of H, molecules pro- 
duced and the number of photons reaching the surface of the reaction 
solution, respectively. The H, evolution rate was measured in the same 
reaction system as the other photocatalytic reactions. To determine 
the wavelength dependence of the EQE, samples were irradiated with 
monochromatic light generated by a Xe lamp and sent through band- 
pass filters with central wavelengths of 350 nm, 360 nm, 365nm,370nm 
or 380 nm (Edmund Optics). The full-width at half-maximum of each 
of these bandpass filters was approximately 10 nm. The dependence of 
the EQE onthe irradiance was established using monochromatic light 
at 365 nm passed through a bandpass filter and neutral density filters 
(Edmund Optics); see Extended Data Table 1 for details. The numbers 
of photons were counted using photon-to-current conversion witha 
Si photodiode and a multimeter in the device shown in Extended Data 
Fig. 8. The photon flux, /, was calculated from the photocurrent density 
generated by a Si photodiode, Photocurrent(Si), measured using the 
device shown in Extended Data Fig. 8a—c, according to the equation 


/=Photocurrent(Si) x “ x QE(Si) (3) 


whereN,, Fand QE(Si) represent the Avogadro constant, the Faraday con- 
stant and the quantum efficiency of the Si photodiode, respectively. A 
cylindrical reactor with an inner diameter of 65 mm with an optically flat 
window made of quartz anda Xe lamp (300 W) were employed for water 
splitting. Because the Xe lamp emits divergent light, /is not uniform over 
the entire light acceptance area. To accurately measure M(photons), a 
circular mask with an inner diameter of 30 mm was inserted to narrow 
the distribution of the light intensity. Then, /was measured at various 
positions by sliding the position of a Si photodiode with an interval of 
2mm, with the level of the Si photodiode maintained at the surface of the 
reaction solution. This gives /(r),/as a function of the distance fromthe 
centre, r. M(photons) can be then calculated by numerically integrating 
I(r) over the entire light acceptance area of the reactor 


ee 
N(photons) =/(0) x tr? + > (i) x Wr 12) 
kal 2 (4) 


%=2k-1 


where k is anon-negative integer. /(r) was taken as the average of the 
intensities measured at two points symmetrical to the centre of the 
circle (see Extended Data Fig. 8d). 


Electrical simulations of SrTiO,:Al particulate systems 
SrTiO,:Al with a perovskite crystal structure was modelled using an 
octagonal geometry based on SEM images of SrTiO,:Al nanoparticles, 
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in which the OER and HER proceed at diagonal and non-diagonal edges, 
respectively. In this model, we defined the.’ and y’ directions relative 
to the non-diagonal {100} and diagonal {110} edges, respectively. The 
SrTiO,:Al material parameters required for electrical simulations were 
obtained from the literature and are provided in Extended Data 
Table 27° °°. Charge carrier recombination was simulated based on 
defect-assisted Shockley-Reed-Hall recombination”, with the defect 
states located in the middle of the bandgap of SrTiO,;:Al. The trapping 
times or lifetimes of the electrons and holes were fixed so that the dif- 
fusion lengths of the charge carriers exceeded the particle size, given 
that the experimentally observed IQE values were nearly 100%. The 
illumination conditions were reproduced by providing a uniform gen- 
eration rate across the particle dimensions, because the random ori- 
entation of crystal facets and multiple scattering in the suspension 
hindered the applicability of the Beer-Lambert law. However, this 
assumption does not affect the band diagram and hence the descrip- 
tion of anisotropic charge transport qualitatively. The boundary con- 
ditions were provided by two distinct pseudo Schottky contacts with 
work functions W°° and W1° at the non-diagonal and diagonal edges, 
respectively. The W10° value was adjusted to equal the midpoint 
between the redox potentials for H, and O, evolution and further tuned 
in accordance with the pH level of the electrolyte (pH =7 for water). On 
this basis, we obtained W109 = 4.64 eV relative to the vacuum energy 
level at a pH of 7 as per the Nernst equation. However, it is known that 
charging or dipole effects can produce differences in the work func- 
tions for the {100} and {110} edges. To include such effects, we fixed 
the value of W1?°and varied W11° from 4.64 eV to 5.05 eV. The extraction 
rates for electrons (for the HER) and holes (for the OER) were non-zero 
for the non-diagonal and diagonal edges, respectively. Using COMSOL 
Multiphysics, the model was discretized by physics-controlled mesh- 
ing that produced an extremely fine mesh near the edges and arelatively 
fine mesh throughout the particle bulk. The COMSOL Multiphysics 
Semiconductor Module was used to perform electrical simulations 
that solved the Poisson, drift-diffusion and continuity equations 
self-consistently for electrons and holes at each discretized node. 
Energy band diagrams and electron and hole densities were plotted in 
one dimension along the’ and y’ directions to show their respective 
variations from the {100} to {110} edges. 

Extended Data Figure 5 shows the effect of the work function dif- 
ference AW, = W2°- w2?° on the energy band diagram and charge 
carrier density under illumination. The accumulation and depletion 
regions that appear near the crystal facets are primarily determined 
by the relative positions of the work functions of the pseudo metal 


contacts (W109, W419) and of the semiconductor (W,). As an example, 


inthe case of W1, W< W, the semiconductor accumulates elec- 
trons from the pseudo metal contacts at the {100} and {110} facets, 
resulting in a flat energy band diagram for AW,, = 0.03 eV (with the 
given particle dimensions). This work function difference does not 
induce selective transport of charge carriers towards different crystal 


facets. However, W10° < W< W1°produces accumulation regions near 


the {100} facets and depletion regions in the vicinity of the {110} facets 
when AW,, > 0.1 eV. The formation of these distinct regions produces 
downward and upward (asymmetric) band bending around the {100} 
and {110} facets, respectively, which leads to anisotropic charge trans- 
port. The extent of asymmetric band bending, and hence anisotropic 
charge transport, in the vicinity of the {100} and {110} facets increases 
with increasing AW,. Furthermore, for AW, < 0.4 eV, we find that the 
hole density is less than the electron density in the bulk of the par- 
ticulate system owing to substantial electron injection fromthe pseudo 
metal contacts associated with the {100} facets. By contrast, for 


AW,, = 0.4 eV, electron injection into the semiconductor bulk is sup- 


pressed by increases in the depletion regions originating from W1° 


around the {110} facets. Asa result, the bulk electron and hole densities 
become almost equal. 


Data availability 


Supporting data are available at the Shinshu University Institutional 
Repository at http://hdl.handle.net/10091/00021822. 
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Extended Data Fig. 3 | Time course of H, and O, evolution over SrTiO,:Al 
photodeposited with Rh (0.1 wt%)/Cr,0, (0.05 wt%)/CoOOH (0.05 wt%) 
under simulated sunlight irradiation. 
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Extended Data Fig. 4| Effect ofinitial background pressure onthe 
photocatalytic water-splitting activity of SrTiO,:Al photodeposited with 
Rh (0.1 wt%)/Cr,0, (0.05 wt%)/CoOOH (0.05 wt%). 
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work function between the {110} and {100} facets. a, Maps of conduction as functions of position. See Methods for further discussion. 
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Extended Data Fig. 8 |EQE measurement. a, Photographs of the devices used 
inthe measurement. Left, side view of the measurement system. Middle left, 
top view of the measurement system. Middle right, arrangement of the lamp 
and reactor. Right, arrangement of the Si photodiode. b,c, Illustrations of 
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water-splitting reactor and photon-counting system (b) and illumination 

unit (c).d, Example of light intensity distribution (365-nm bandpass filter and 
0.1-optical-density neutral density filter; left) and model used to calculate the 
number of incident photons (right). 


Extended Data Table 1| Results of EQE measurements 


Weselonath opt Eley -— / ume ht eoERe 
350 0 5.26x1019 418 203 957 
360 0 1.27x 1022 101 54.3 95.9 
365 0 1.36x 1021 1060 512 93.2 
365 0.3 749x102 557 276 89.4 
365 05 484x102 366 177 91.1 
365 1.0 1.70x 102 131 67.3 928 
370 0 2.54x 1020 126 65.2 59.7 


380 0) 2.06 x 102° 57.3 25.8 33.6 


The EQE value at 365 nm was essentially independent of the light intensity over the range 
of intensities examined. The average EQE in this series of experiments was 91.6%, which was 
used to generate the plot shown in Fig. 1b. 

*OD, optical density. 
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Extended Data Table 2 | Parameters used in the calculations 
of the energy band diagrams and electron and hole densities 
for the SrTiO,:Al particulate system 


Parameters Numerical values 
Effective density of states - conduction band 7.94 x 102° cm’ (ref.25) 2 
Effective density of states - valence band 6.71 x 102° cm 2 
Charge carrier mobility 6 cm2V-'s*t (ref.25) 
Dielectric constant 300 (ref.26) 

Depth of Fermi level from Conduction band 0.5 eV (ref.27) 

Band Gap 3.2 eV (ref.27) » 

Electron affinity 4.2 eV (ref. 28) 
Absorption Coefficient (a) at 365 nm 1.3 x 104 cm" (ref.29,30) 
The number of incident photon (N) at 365nm —- 9.55 x 10'§ cmr?s-t > 
Generation rate (G = aN) 1.24 x 107° cms" 
Charge carrier lifetime 24ns° 


“The effective densities of states for the conduction and valence bands were obtained from 
the reported effective mass. 

>The numerical values of these parameters were obtained from the experimental observations 
and conditions. 

‘The charge carrier lifetime used for trap-assisted Shockley-Reed-Hall recombination is 
calculated so that the diffusion length of the photogenerated charge carriers is greater than 
the particle dimensions. 


Data from refs. °° °°. 
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® Check for updates 


The ubiquity of tertiary alkylamines in pharmaceutical and agrochemical agents, 
natural products and small-molecule biological probes'” has stimulated efforts 
towards their streamlined synthesis* *. Arguably the most robust method for the 
synthesis of tertiary alkylamines is carbonyl reductive amination’, which comprises 


two elementary steps: the condensation of asecondary alkylamine with an aliphatic 
aldehyde to form an all-alkyl-iminium ion, which is subsequently reduced by a hydride 
reagent. Direct strategies have been sought for a ‘higher order’ variant of this reaction 
via the coupling of an alkyl fragment with an alkyl-iminium ion that is generated 

in situ’? *, However, despite extensive efforts, the successful realization of a ‘carbony| 
alkylative amination’ has not yet been achieved. Here we present a practical and 
general synthesis of tertiary alkylamines through the addition of alkyl radicals to 
all-alkyl-iminium ions. The process is facilitated by visible light and a silane reducing 
agent, which trigger a distinct radical initiation step to establish a chain process. This 
operationally straightforward, metal-free and modular transformation forms tertiary 
amines, without structural constraint, via the coupling of aldehydes and secondary 
amines with alkyl halides. The structural and functional diversity of these readily 
available precursors provides a versatile and flexible strategy for the streamlined 
synthesis of complex tertiary amines. 


Carbonyl reductive amination is an effective method forthe preparation 
of linear tertiary alkylamines; however, the synthesis of branched vari- 
ants is more problematic®. The condensation of asecondary alkylamine 
witha di-alkylketone is often slow and may require the use of activating 
reagents; di-alkylketones are also not as readily available as aldehydes 
and may require multi-step syntheses. A method by which an alkyl 
group could be directly added to an aldehyde-derived alkyl-iminium 
ion would circumvent these problems and would increase complex- 
ity by leveraging three, rather than two, programmable feedstocks 
into a tertiary alkylamine synthesis. Unfortunately, the most logical 
approach to this multi-component strategy—the direct addition of 
common organometallic nucleophiles to alkyl-iminium ions—is seldom 
successful in delivering the tertiary alkylamine product” “. Organo- 
metallic reagents—such as Grignard and alkyl-lithium reagents—are 
rarely compatible with in-situ iminium ion formation from alkyl alde- 
hydes and secondary alkylamines, necessitating the pre-formation 
and isolation of an unstable iminium ion’*”’. Additionally, the high 
basicity of these organometallic reagents results in competitive depro- 
tonation of the C-H bond adjacent to the carbon-nitrogen double 
bond of the iminium ion, substantially restricting the scope of the pro- 
cess'®, Less reactive zinc- or cerium-based organometallic compounds 
also exhibit limited scope and are restricted to activated alkyl frag- 
ments”?"51718, The Petasis reaction offers broader scope inthe amine 
component, but has specific substrate requirements for the carbonyl 
and organoboron-derived components”. More efficient reactivity has 
been demonstrated with C(sp)-nucleophiles””, C(sp”)-nucleophiles”” 
andallyl-nucleophiles’””’, or with activating auxiliary-derived imines’; 


however, the successful deployment of a suitably reactive and gener- 
ally available source of unactivated alkyl nucleophile for addition to 
an alkyl-iminium ion continues to prove challenging. Considering the 
limitations of the organometallic methods, strategies based on the 
addition of neutral alkyl-radical species to imine-derivatives have also 
been investigated!°** °°, However, the low electrophilicity of the car- 
bon-nitrogen double bond means that the use of an activating group 
either on the nitrogen atom (R’) or in the carbonyl component (R?) is 
essential to render the imine-derivative sufficiently reactive towards 
alkyl-radicals, thereby limiting the scope and practical application of 
these reactions with respect to the downstream synthesis of tertiary 
amine targets" ”¢ (Fig. 1b). 

Inour efforts to develop a direct carbonyl alkylative amination (CAA) 
method, we proposed that the addition of a neutral carbon-centred 
radical to an in-situ-generated, positively charged, alkyl-iminium ion 
would obviate the requirement for auxiliary-activated imines and 
provide a single-step synthesis of tertiary alkylamines. To the best 
of our knowledge, the elementary step comprising the direct inter- 
molecular addition of an alkyl-radical to an alkyl-iminium ion has not 
been reported, with the exception of one example from 1991, in which 
an organo-mercury salt was irradiated with ultraviolet (UV) light to 
generate an alkyl-radical that was added to a formaldehyde-derived 
iminium ion”. Addition of a neutral nucleophilic alkyl-radical to an 
alkyl-iminium ion would afford an aminium radical cation, a species that 
could then be intercepted through hydrogen atom transfer to form the 
protonated tertiary alkylamine (Fig. 1c). A radical-based CAA requires 
the orchestration of anumber of simultaneously occurring events to 
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Fig. 1| Evolution of a strategy for carbonyl alkylative amination. a, Addition 
of alkyl groups to alkyl-iminium ions remains a challenging transformation. 
b, Radical alkylation of auxiliary-activated imine derivatives. c, Carbonyl 


form reactive intermediates, each of whichis capable of following com- 
petitive and deleterious reaction pathways. For example, it is necessary 
to maintain a high concentration of the reactive alkyl-iminium ion in 
order to effectively engage the incipient alkyl-radical. Furthermore, 
the reagent effecting the hydrogen atom transfer must be capable of 
rapidly intercepting the transient aminium radical cation but must not 
reduce the iminium ion. Here we report the realization of our hypothesis 
through the development of modular and efficient CAA, amethod that 
combines three abundant feedstocks—secondary amine, aldehyde 
and alkyl-halide—in a single step. A key part of this strategy is the use 
of visible light to facilitate a radical initiation step under mild condi- 
tions, leading to a practical and general synthesis of complex tertiary 
alkylamines. 

Our initial investigations focused on a representative reaction 
between N-methylbenzylamine (1a), hydrocinnamaldehyde (2a) and 
2-iodopropane (3a) (Fig. 2a). Attempts to achieve CAA using classical 
protocols for radical generation failed to produce an efficient reac- 
tion’. For example, the use of tributyltin hydride (Bu,Sn-H) and azo- 
bisisobutyronitrile (AIBN) produced trace amounts of alkylamine 4a 
(entry 1); this could be modestly improved by using tris(trimethylsilyl) 
silane ((Me,Si),;Si-H) in place of the tin reagent (entry 2)”. We evaluated 
the use of different additives in order to promote the high concentra- 
tions of alkyl-iminium required to effectively intercept the alkyl-radical. 
High conversion was achieved in the presence of tert-butyldimethylsilyl 
trifluoromethanesulfonate (TBSOTF), as determined by’H nuclear mag- 
netic resonance (NMR) spectroscopy (Supplementary Information 1). 
For example, a thermal reaction combining (Me;Si);Si-H, AIBN and 
TBSOTf produced the desired alkylamine 4a, but as a3:1 mixture with 
the corresponding reductive amination product 4a’ (entry 3; 4a’ not 
shown). Notably, a reaction that combined TBSOTf with Bu,Sn—H and 
AIBN under thermal conditions resulted exclusively in reductive amina- 
tion to 4a’ (entry 4). Further exploration of the target transformation 
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alkylative amination for the synthesis of complex tertiary alkylamines: the 
addition of alkyl radicals to alkyl-iminium ions. 


(Supplementary Information 1) led us to identify a set of unique and 
simpler reaction conditions that exploited the effect of visible-light 
activation; after parameter optimization, this approach produced high 
yields of the desired alkylamine 4a (92% assay yield and 80% isolated 
yield) with only trace amounts of reductive amination (entry 5). When 
TBSOTf was omitted, 4a was still obtained in 82% assay yield, providing 
a set of conditions that could be used when acid-sensitive functional 
groups were present. Animportant aspect of these mild reaction condi- 
tions is the distinct nature of the radical initiation step. Although there 
are anumber of possible explanations for this phenomenon, homolysis 
of the C-I bond seems unlikely given that a control reaction using a 
455-nm long-pass filter—which removes the minor UV and near-UV com- 
ponents of the blue LED lamp that are required for the homolysis?°?"— 
still generated 4ain 86% yield (entry 7). We then systematically studied 
the light-absorbing properties of each component and of likely interme- 
diates involved inthe CAA reaction—secondary amine, alkyl-aldehyde, 
alkyl-iodide, enamine, iminium and (Me,Si),Si-H—as well as their com- 
binations, in order to investigate the possibility of photosensitiza- 
tion or an electron donor-acceptor complex that could be involved 
in radical generation™. Although neither any component alone nor 
the combination of any two components showed absorption above 
or close to 455 nm, the UV-visible absorption spectrum of a ternary 
mixture comprising enamine (formed by condensation of 1a and 
2a), 2-iodopropane (3a) and (Me,Si),Si—H revealed a new, red-shifted 
band (400-500 nm). Although at present we do not have sufficient 
evidence to invoke a specific interaction, this observation potentially 
supports a multi-component interaction, which leads to radical initia- 
tion upon excitation with visible light (Supplementary Information 1). 
In any case, this radical initiation process offers distinct advantages 
over other light-mediated processes: whereas previous reports have 
been restricted to the formation of carbon-heteroatom bonds involv- 
ing the initiating reagent*"’; in this process the (Me,Si),Si-H is not 
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Fig. 3 | Scope of the carbonyl alkylative amination reaction. a, Scope of the aldehyde component. b, Scope of the alkyl halide component. *No TBSOTf used. 
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incorporated in the newly formed bond, thus enabling the generation of 
a C(sp*)-C(sp*) bond. Furthermore, the benign nature of the new radical 
generation step, inthe absence of classical chemical initiators, could aid 
the development of straightforward, metal-free radical C(sp*)-C(sp’) 
bond-forming reactions from alkyl-halides. 

Having established a viable protocol, we began an extensive 
investigation of the scope of CAA by first testing its capacity to pro- 
duce a-branched cyclic tertiary alkylamines. A wide range of func- 
tionalized saturated cyclic and heterocyclic secondary alkylamines 
could be successfully used in this radical-based process (Fig. 2b), 
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producing the desired alkylamines (4b-40). Notably, auxiliary-activated 
radical-addition methods cannot readily access this class of tertiary 
alkylamines, which can be produced ina single step using the new 
protocol. We found that N,N-dialkylamines with a range of both linear 
and branched functionalized alkyl substituents—including aromatic 
heterocycles—gave the tertiary alkylamines 4a, 4p—4ae in good yields. 

For certain amines with electron-withdrawing groups close to the 
nitrogen atom, reductive amination was observed as a competing side 
reaction (10% with 4s). N-alkyl anilines and even poorly nucleophilic 
diarylamines could be used effectively (4af—4.ai), which expands the 
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highlighting the ‘biased’ nature of the products resulting from the activated 


potential scope of the CAA process as these functionalities are ubiqui- 
tous in pharmaceutically relevant molecules. A reaction using N-phenyl 
hydroxylamine produced the expected product (4aj) via addition to 
the corresponding nitrone intermediate. To demonstrate the com- 
patibility of this process with features that are typically encountered 
in pharmaceutical agents, we showed that a range of drug fragments 
could tolerate the reaction conditions, resulting in the desired com- 
plex tertiary amine products in synthetically useful yields (4ak-4.aq). 

Using a selection of functionalized linear aldehydes, we were able 
to prepare the corresponding tertiary alkylamines (Fig. 3a, 5a—5k). 
Branched aldehydes—including those with saturated cyclic and het- 
erocyclic rings—also performed well, forming the hindered tertiary 
alkylamines 5I-5q. Beyond aliphatic aldehydes, we found that formal- 
dehyde (to 5r), substituted benzaldehydes (to 5s—5v) and heteroaryl 
aldehydes (5w-5x) function well in the reaction, and their successful 
conversion to amine products considerably expands the scope of CAA. 


Of, 65% 


alkene acceptor. EWG, electron withdrawing group. c, The distinct mechanistic 
pathways of alkene hydroaminoalkylation and CAA. d, Structurally diverse 
tertiary alkylamines formed by CAA in one step from readily available building 
blocks.*No TBSOTf used. 


In the case of primary alkyl iodides, the reaction was aided by the 
addition of 5-10 mol% of the radical chain initiator ethyl 2-iodo-2 
-methylpropionate™, which can generate alkyl radicals more easily 
than can primary alkyl iodides. Both simple and functionalized pri- 
mary alkyl iodides proved to be good coupling partners under these 
conditions, with linear alkane fragments added to alkyl-iminium ions 
to form tertiary alkylamines 6a-6n. Although the reaction could 
accommodate various functional groups in the alkyl-halide, slightly 
lower yields were observed for radicals containing proximal electron 
withdrawing groups (6c). This is likely to originate from a competitive 
hydrogen atom transfer between the silane and the alkyl-radical, as 
well as the slower rate of addition of a less nucleophilic alkyl-radical 
to the iminium ion. Benzyl bromides and methoxymethyl bromide, 
the iodides of which are unstable, were found to be suitable alkylat- 
ing agents and gave the amine products in synthetically useful yields 
(61, 6n).Asexpected, secondary alkyl-halides were competentsubstrates 
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and formed amines 60-6t in good yields. Tertiary alkyl-iodides were also 
excellent coupling partners: the hindered alkylamine products 6u-6w, 
which would be challenging to prepare via reductive amination, were 
obtained in high yield. A selection of unactivated alkyl-bromides reacted 
under the modified conditions—with the radical chain initiator**— 
to give reasonable yields of the tertiary alkylamines 6p, 6r and 6w; 
this provides an alternative protocol if an alkyl-iodide is not avail- 
able. We were pleased to find that iodomethane, acting as a source of 
methyl-radical, could be added to an iminium ion to provide access to 
animportant class of branched tertiary alkylamines (6x). In addition to 
the 90 examples of successful CAA documented in Figs. 2 and 3, we have 
detailed an extended assessment of the reaction scope (Supplementary 
Fig. 9, Supplementary Information 1) that includes substrates that give 
moderate—but still synthetically useable—yields, as well as examples 
for which the process is low yielding or unsuccessful. 

As a further demonstration of this methodology, we showed 
that the pharmaceutical agent desloratadine, which contains a sec- 
ondary amine, can be used as the amine component of this CAA to 
directly append two interchangeable molecular fragments within the 
a-branched tertiary alkylamine product (Fig. 4a). Tertiary amine deriva- 
tives of desloratadine have recently attracted attention owing to their 
improved pharmacokinetic properties, and convey a longer duration 
of antagonism at the histamine H1 receptor®. Using the standard cou- 
pling conditions, we showed that a collection of desloratadine-derived 
a-branched tertiary amine products can be readily prepared ina 
modular fashion (8a-g). As a demonstration of the efficacy of these 
reactions, we prepared an a-d,-N-alkyl derivative of desloratadine 
(>95% deuterium incorporation, 8g) ina single step from commercial 
d,-paraformaldehyde and 2-iodopropane. Furthermore, a similar reac- 
tion using d,-iodoethane and a linear aldehyde provided direct access 
toa different class of labelled tertiary amine (8h)*. 

Direct CAA on desloratadine is distinct from our previous work” on 
photoredox-mediated alkylamine synthesis, which was based on alk- 
ene hydroaminoalkylation via the generation and reaction of a-amino 
radicals (Fig. 4b, c). Although we previously showed that this secondary 
amine can engage activated alkenes (to 8i), the tertiary alkylamine 
products prepared by that approach always contain the structural 
signature derived from the intrinsic requirements of a reaction that 
necessitates use of an activated alkene acceptor. By contrast, the pre- 
sent mechanistically distinct reaction enables the use of a wide range of 
primary, secondary and tertiary alkyl-halides, providing direct access to 
tertiary alkylamines that are unbiased by any functional requirements 
of the transformation and contain no trace of the activating groups 
required in the readily available building blocks. 

A major advantage of this CAA strategy is its modularity; the vast 
array of coupling partners within each of the three abundant feedstocks 
required for this reaction means that structural and functional diversity 
can be easily programmed into the tertiary amine products (Fig. 4d). 
To illustrate this, we evaluated the cross-compatibility of the process 
by varying each reaction component in order to produce a range of 
complex tertiary alkylamines with diverse structural and functional 
properties. The robust nature of the reaction is reflected by the ease 
with which densely functionalized alkylamines 9a-9g can be produced 
ina single step from readily available building blocks, highlighting the 
suitability of such alkylamines for early-stage drug discovery applica- 
tions (Fig. 4d). The streamlined nature by which these products are 
prepared demonstrates the synthetic potential of this process. 
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The fossil record of mammaliaforms (mammals and their closest relatives) of the 
Mesozoic era from the southern supercontinent Gondwanais far less extensive than 
that from its northern counterpart, Laurasia’*. Among Mesozoic mammaliaforms, 
Gondwanatheria is one of the most poorly known clades, previously represented by 
only a single cranium and isolated jaws and teeth’. Asa result, the anatomy, 
palaeobiology and phylogenetic relationships of gondwanatherians remain unclear. 
Here we report the discovery of an articulated and very well-preserved skeleton of a 
gondwanatherian of the latest age (72.1-66 million years ago) of the Cretaceous 
period from Madagascar that we assign to anew genus and species, Adalatherium hui. 
To our knowledge, the specimen is the most complete skeleton of aGondwanan 
Mesozoic mammaliaform that has been found, and includes the only postcranial 
material and ascending ramus of the dentary known for any gondwanatherian. A 
phylogenetic analysis including the new taxon recovers Gondwanatheria as the sister 
group to Multituberculata. The skeleton, which represents one of the largest of the 
Gondwanan Mesozoic mammaliaforms, is particularly notable for exhibiting many 
unique features in combination with features that are convergent on those of therian 
mammals. This uniqueness is consistent with a lineage history for A. huiof isolation on 
Madagascar for more than 20 million years. 


Island environments promote evolutionary trajectories among mam- 
mals and other vertebrates that contrast with those on continents, and 
which result in demonstrable anatomical, physiological and behav- 
ioural differences® ”°. These differences have previously been ascribed 
to markedly distinct selection regimes that involve factors such as 
limited resources, reduced interspecific competition and a paucity of 
predators and parasites®”” . Although there are numerous examples 
of insular effects on mammals of the Cenozoic era’ ”"* ”, the effects of 
long-term isolation on islands are virtually unknown among Mesozoic 
mammaliaforms, and Mesozoic biotas more generally. Here we describe 
and analyse a complete, well-preserved skeleton of a gondwanathe- 
rian mammal (Fig. 1, Extended Data Fig. 1) from the latest Cretaceous 
period of Madagascar, which was then—and still remains—an island. 
This skeleton reveals an array of unusual and even unique adaptations 
that we hypothesize are due to evolution in an insular environment. 
Thenewspecimen—designated University of Antananarivo (UA) 9030—is 
the holotype ofanewgenusand species of gondwanatherian, Adalatherium 
hui, whichweassign to the new family Adalatheriidae. UA 9030 is so well pre- 
served thatthe distalmost caudal vertebrae, tiny phalangeal sesamoids and 


even non-osseous tissues (suchas costal cartilage) are preserved in articular 
relationships. A. huihas an estimated body mass of about 3.1 kg (Extended 
Data Fig. 2, Supplementary Information, Supplementary Table 1). Assuch, 
itis the third-largest known mammal represented by anything more than 
isolated jaws and teeth from the Mesozoic era of Gondwana, despite the 
fact that it is represented by an immature individual. 

Gondwanatheria, a clade restricted to Late Cretaceous and Palaeo- 
gene horizons of Gondwana, is particularly poorly represented in the 
fossil record*>. Prior to the discovery reported here, the cranium of Vin- 
tana—also from the latest Cretaceous period of Madagascar—was the only 
gondwanatherian represented by more than isolated dental or gnathic 
remains**. Thenew fossil greatly expands our knowledge of gondwanathe- 
rians and of Mesozoic mammaliaforms from Gondwana in general, the 
fossil record of which is extremely limited relative to that of Laurasia’”. 


Mammalia, Linnaeus 1758 
Allotheria, Marsh 1880 
Gondwanatheria, Mones 1987 
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Fig. 1| Skull and postcranial skeleton of A. hui holotype (UA 9030). a, ‘Top’ view, as preserved. Scale bar, 5cm. b, Reconstruction in left lateral view. Left and 


right sides indicated as (I) and (r), respectively. 


Adalatheriidae Krause, Hoffmann, Wible, and Rougier, 2020, fam. nov. 
Adalatherium hui Krause, Hoffmann, Wible, and Rougier, 2020, gen. 
et sp. nov. 


Etymology. Adala (Malagasy), ‘crazy’; therium (Latinized form of the 
Greek Onpiov), ‘beast’; the species name hui is in reference to the late 
Yaoming Hu for his contributions to our knowledge of early mammals. 
Holotype. UA 9030, skull and postcranial skeleton. 

Type locality and horizon. MAD99-15, Berivotra study area (north- 
western Madagascar). Upper Cretaceous series (Maastrichtian stage, 
72.1-66 million years (Myr) ago), Anembalemba Member, Maevarano 
Formation, Mahajanga Basin’. Additional information onthe geological 
context is provided in the Supplementary Information. 
Diagnosis. A. hui differs from all other known Mesozoic mammalia- 
forms in possessing quadrangular upper postcanine tooth crowns 
with four major cusps and three connecting perimetric ridges mesially, 
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lingually and distally that border—on three sides—a central valley that 
opens buccally; and lower postcanine tooth crowns with four major 
cusps arranged in a diamond pattern and connected by four perimetric 
crests, anda prominent mesiobuccal basin on the two distalmost lower 
postcanines. The full diagnosis is provided in the Supplementary 
Information. 


Cranium 

The cranium of Adalatherium reveals a mosaic of plesiomorphic and 
derived features (Fig. 2a—d, Extended Data Fig. 3, Supplementary Vid- 
eos 1-3). The presence of a very large internasal vacuity, five infraorbi- 
tal foramina, a large foramen in the lacrimal that is not related to the 
nasolacrimal duct (probably for the ethmoidal branch of ophthalmic 
nerve (V,) and associated vessels), numerous nasal foramina anda 
paranasal sinus that arises from the anterior vestibule of the nasal 
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Fig. 2 | Cranium, lower jaw and dentition of A. huiholotype (UA 9030). 

a-d, Reconstructed cranium in dorsal (a), ventral (b), right lateral (c) and 
anterior (d) views. e-g, Reconstructed right lower jaw in lateral (e), dorsal 
(=occlusal) (f) and medial (g) views. h-k, Micro-computed tomography (uCT) 
digital renderings of right upper dentition, showing the postcanine teeth (h), 
distal incisor (i) and mesial incisor (j) in buccal views, and the postcanine teeth 


cavity are particularly unusual for mammaliaforms. The snout region 
shares several features with that of Vintana*’, including the presence 
of a massive lacrimal bone that excludes the frontal from contacting 
the maxilla and a large septomaxilla with prominent posterodorsal 
and intranarial processes. By contrast, Adalatherium does not possess 
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in occlusal view (k).I-n, pCT digital renderings of right lower dentition, 
showing the postcanine teeth (I) and incisor (m) in buccal views, and the 
postcanine teeth in occlusal view (n). Scale bars, 2cm (a-g; scale bar abovee 
and fapplies to a-g),5 mm (h-n). PC, upper postcanine tooth; pc, lower 
postcanine tooth. 


several autapomorphic features that are seen in Vintana, including a 
massive jugal flange and contact between the premaxillae and palatines. 

Although much of the posterior part of the cranium was severely 
damaged post mortem, the left inner ear of UA 9030 is partially pre- 
served and exhibits several features that were previously unknown 
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among mammaliaforms (Extended Data Fig. 4). Most notably, the 
primary bony lamina is structurally different from that of therians in 
being single-layered instead of double-layered, and the branches of the 
cochlear nerve appear to have passed along the surface of—rather than 
within—the primary lamina. This unique morphology suggests that 
the primary bony lamina of Adalatherium evolved convergently with 
those of therian mammals. The cochleae of Adalatherium and Vintana 
are unique among mammaliaforms in possessing a secondary bony 
canal that parallels the cochlear ganglion canal, and probably enclosed 
a vascular network. Overall, the cochlear canal is curved through at 
least 210° and possesses, in addition to the primary bony lamina, the 
base of a secondary bony lamina, a cribriform plate, a well-developed 
cochlear ganglion canal anda separate canal for the lagenar nerve; this 
last feature is not present in Vintana’. 


Lower jaw 


The lower jaw of Adalatherium is more complete than in any other 
known gondwanatherian, and is the first to preserve the ascending 
ramus of the dentary (Fig. 2e-g, Extended Data Fig. 5, Supplementary 
Videos 4-7). Among gondwanatherians, the horizontal ramus of Ada- 
latherium is essentially identical to that of Sudamerica” but differs 
from that of Galulatherium’, primarily in having a stepped differential 
in height between the diastema and the postcanine alveolar portion. 
The dentary of Adalatherium is short and deep, and bears a large dias- 
tema between the incisor and postcanine teeth, a prominent pterygoid 
fossa and shelf, and a masseteric fossa that extends anteriorly onto 
the horizontal ramus. There is no evidence of a postdentary trough, 
Meckelian sulcus, coronoid bone or angular process. In these features, 
the dentary of Adalatherium is similar to those found in members of 
Multituberculata, which is a largely Laurasian group. The dentaries 
of euharamiyidans differ from that of Adalatherium in possessing an 
angular process, a coronoid bone and—according to ref. 7°—a ‘reduced’ 
postdentary trough (although this is disputed in ref. ”). The dentary of 
Haramiyavia is much more plesiomorphic than those of both Adalath- 
erium and euharamiyidans in retaining a long and shallow horizontal 
ramus, a fully developed postdentary trough and Meckelian sulcus, 
and in lacking a pterygoid fossa and shelf”. The masseteric fossa of 
Adalatherium is positioned relatively high dorsally onthe dentary, an 
apparent autapomorphy. 


Dentition 


The dentition of Adalatherium is unlike that of any known Mesozoic 
mammaliaform (Fig. 2h-n, Supplementary Video 8 for upper post- 
canines). There are two very large, open-rooted upper incisors, each 
of which bears a buccally restricted band of enamel. The size, shape 
and positional relationships of the upper incisors are very similar to 
those discerned from the alveoli of Vintana®*. The presence of upper 
canines in Adalatherium is indicated by tiny elliptical alveoli that are 
separated mesially from the incisors and distally from the postcanines 
by sizeable diastemata. The first upper postcanine is a small, simple, 
two-rooted premolariform tooth. The four more-distal postcanines 
are each supported by five or more roots and are unique among Meso- 
zoic mammaliaforms in bearing four major cusps connected by ridges 
mesially, lingually and distally that border—on three sides—a central 
valley that opens buccally. 

The single lower incisor is large, curved and open-rooted, and bears 
enamel that is largely restricted to the buccal surface. In these features, 
the lower incisor resembles those known for other gondwanatherians 
(except for the enamel-less condition in Galulatherium’). Each of the 
four lower postcanines has four cusps connected by prominent crests, 
forming a diamond pattern. The most mesial cusp dominates the crown 
on all four teeth. The second lower postcanine bears a mesiobuccal 
bulge, which is developed into a prominent basin on the penultimate 
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and ultimate lower postcanines. The first lower postcanine has two 
roots, whereas each of the more distal postcanines has at least four. 

The enamel microstructure of Aadalatherium consists of relatively 
plesiomorphic ‘normal’ radial enamel (Extended Data Fig. 6), typi- 
cal of several gondwanatherians from the Late Cretaceous epoch and 
Palaeogene subperiod of Argentina**. It is unlike the ‘modified’ radial 
enamel, with pronounced inter-row sheets of interprismatic matrix, 
documented for other gondwanatherians (Lavanify and Vintana) from 
the Late Cretaceous epoch of India and Madagascar**. 


Postcranial skeleton 


UA 9030 includes the only postcranial material assigned to a gondwa- 
natherian, and Adalatherium is only the fourth Mesozoic mammaliaform 
from Gondwana represented by articulated postcranial remains’. The 
postcranial skeleton exhibits a number of unusual features, including an 
anteroposteriorly bowed and mediolaterally compressed tibia, a troch- 
leated surface on the distal astragalus, a large number of trunk vertebrae 
(at least 16 thoracic and 12 lumbar vertebrae), anda short tail (24 vertebrae, 
almost all wider than they are long) (Figs. 1,3, Extended Data Figs. 1, 7, 8). 
The long spinous and transverse processes of the thoracic and lumbar 
vertebrae suggest the presence of enhanced epaxial (back) musculature. 
Inthe pectoral girdle, a procoracoid boneis absent but aseparate coracoid 
is well-developed (Fig. 3a, Extended Data Fig. 9a, b). The forelimbs had a 
moderately parasagittal posture, as indicated by the ventrally directed gle- 
noid fossa and the well-developed humeral trochlea (Fig. 3a, d, Extended 
Data Fig. 8a, b). By contrast, the asymmetrical medial and lateral condyles 
of the femur are suggestive of amore sprawled hindlimb posture. Other 
notable features of the pelvis and hindlimbs include the presence of a 
large obturator foramen (similar in size to those of therians), an epipubic 
bone andalarge, separate parafibula (Fig. 3, Extended Data Fig. 9c, d). 


Phylogenetic relationships 


Our phylogenetic analysis, performed using 84 cynodont taxa and 
530 morphological characters, places Adalatherium within Gondwa- 
natheria, which in turn is placed within Allotheria as the sister taxon 
to Multituberculata (Extended Data Fig. 10, Supplementary Informa- 
tion). Adalatheriidae (as solely represented by Adalatherium) is recov- 
ered as more derived than Ferugliotheriidae and stemward relative to 
Sudamericidae. 

Previous phylogenetic analyses that include the recently discovered 
Early Cretaceous purported haramiyidan Cifelliodon”™ advanced 
the idea that gondwanatherians are nested within Eleutherodontida, 
basal to the purported Early Cretaceous hahnodontids Hahnodon*™ 
and Cifelliodon”’. Although Hahnodon was not included in our analysis 
because it is represented by only one (or possibly two) isolated teeth”, 
Cifelliodon is recovered at the base of Allotheria, which also includes 
Euharamiyida and ‘Multituberculata + Gondwanatheria’. Our analysis 
places the haramiyidans Haramiyavia and Thomasia—along with the 
poorly known taxon Megaconus— outside of Mammaliaformes, with no 
close relations to allotherians. This finding is in contrast to previous 
analyses for Vintana’ and for Jurassic euharamiyidans” ”’. 


Evolution in isolation 


Among mammals, the most obvious and quantifiable influences of evolv- 
ing onislands are those related to body size. This observation has led 
to the articulation of the ‘island rule’, which states that—evolutionar- 
ily—small mammals onislands increase, and large mammals decrease, in 
size"°, In addition, evolutionininsular environments is thought to result 
in changes in anatomy, physiology, behaviour and life-history strategies, 
and (at the faunal level) relatively low species richness, taxonomic imbal- 
ance, high endemismanda general level of ‘primitiveness’®”. Although 
somewhat controversial and clearly not ubiquitous”, the island rule is 
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Fig. 3| Skeleton of A. hui holotype (UA 9030). a-g, Digitally reconstructed 
skeleton in left lateral view, highlighting the left scapulocoracoid in lateral view 
(a), thoracic vertebra 6 and lumbar vertebra 7 in anterior and dorsal views 
(missing parts mirrored and rendered as semi-transparent) (b), the left femur in 
distal and anterior views (c), the left humerus in anterior and distal views (d), 


generally established as a pervasive pattern®”**. Examples of insular 
‘dwarfism’ and ‘gigantism’ from Pliocene, Pleistocene and Holocene 
epochs abound”, including from Madagascar (pygmy hippopota- 
muses” and giant lemurs™). Examples from earlier in the Cenozoic era are 
relatively sparse’*"*” and the effects of long-term isolation are extremely 
poorly documented for Mesozoic mammaliaforms. 

Among Mesozoic mammaliaforms, adaptations related to evolution 
in isolation have been most notably claimed for two island environ- 
ments, bothfrom the latest part of the Cretaceous period (Maastrichtian 
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age): (1) the gondwanatherian Vintana from Madagascar’? and (2) the 
multituberculates Barbatodon and Litovoi from the archaic ‘Hateg 
Island’>”¢ (now part of Romania). Whether Barbatodon and Litovoi 
were part of a fauna that developed unique adaptations attributable 
to evolution in an insular environment is questionable—as is whether 
Hateg Island was even an island (Supplementary Information). Because 
ofits completeness and undoubted existence in an insular environment, 
the skeleton of Adalatherium provides an opportunity to examine 
evolution in isolation among Mesozoic mammaliaforms. 
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Fig. 4| Key stages in plate tectonic history of Madagascar. a, Position of 
Madagascar before rifting between West Gondwana (South America and 
Africa) and East Gondwana (Madagascar, Seychelles, Indian subcontinent, 
SriLanka, Antarctica and Australia) at 183 Myr ago (Early Jurassic epoch). 

b, Separation of Indo-Madagascar from Antarctica and Australia at 124 Myr ago 
(mid-Early Cretaceous epoch). c, Separation of Indian subcontinent from 
Madagascar at 88 Myr ago (mid-Late Cretaceous epoch). d, Approximate time 


Madagascar separated from the Indian subcontinent and the Sey- 
chelles about 88 Myr ago” (Fig. 4). As a result, after separation, the 
obligate terrestrial taxa of Madagascar evolved in complete isolation 
and the only taxa that gained access to the island subsequently were 
flying, swimming or rafting forms that were able to disperse across 
considerable marine barriers** “. Madagascar remains today a large, 
isolated continental block that is topographically high, geotectoni- 
cally stable and at a minimum distance of 430 km from the closest 
mainland (Africa). 

The postcranial skeleton of UA 9030 indicates that Adalatherium (and 
perhaps other gondwanatherians) was neither volant nor aquatic; it was 
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of deposition of Maevarano Formation at 66 Myr ago (latest Cretaceous 
period). Solid black lines indicate current coastlines of Madagascar and east 
Africa; brown represents Precambrian terranes; and yellow indicates 
sedimentary basins along west coast of Madagascar. The discovery site of 
UA 9030 is indicated by red star ind. Scale bars, 500 km. Maps adapted from 
Earthworks (www.reeves.nl/gond.com). 


an obligate terrestrial form that was relatively less capable of dispersal 
across marine barriers and more likely to have evolved on Madagascar. 
There are at least two other gondwanatherians that lived on Madagas- 
car contemporaneously with Adalatherium: Lavanify (based on two 
fragmentary isolated teeth” that are insufficient to be informative in 
the current context) and Vintana (based ona complete cranium*“). 
The cranium and upper postcanine dentition of Vintana exhibit sev- 
eral features that are unknown among Mesozoic mammaliaforms, but 
the number of such features in Vintana are far fewer than those for 
Adalatherium (based onacomplete skeleton). Furthermore, given its 
deeply nested phylogenetic position within Allotheria (Extended Data 


Fig. 10), several additional features of Adalatherium clearly evolved 
convergently with those of non-allotherian mammals—particularly 
therians (Supplementary Information). Considered together, and 
in concert with the palaeogeographical history of Madagascar, we 
hypothesize that the unusual morphological attributes of these two 
gondwanatherians are due to long-term evolution in isolation in an 
island environment, paralleling the cases adduced for various mam- 
mals on Mediterranean islands during the Neogene subperiod® °°. 
Furthermore, both Adalatherium and Vintana are large, being among 
the largest mammaliaforms known from the entire Mesozoic era of 
Gondwana. Vintana, whichis even larger than Adalatherium, is the 
second-largest known Mesozoic mammaliaform globally and the larg- 
est from Gondwana*. Although the fossil record of Mesozoic mam- 
maliaforms (especially from Gondwana) is too poor to conclusively 
establish that the large size of Vintana and Adalatherium reflects island 
gigantism, sucha conclusion is consistent with the island rule® 2°. 

The vertebrate fauna associated with Adalatherium (Supplementary 
Table 2) also exhibits unique characteristics reflective of its relictual 
nature. For instance, relatively high numbers of derived features areseen 
in other terrestrial members of the latest Cretaceous fauna of Madagas- 
car, including the ceratophryid frog Beelzebufo, the crocodyliform Sim- 
osuchus and the theropods Majungasaurus and Masiakasaurus (ref. *8 
and references therein). None of these genera is known from any other 
landmass, which again attests to high endemicity (although the very 
poor Mesozoic fossil record from Gondwana must be acknowledged in 
this regard). Furthermore, many of these forms had ghost lineages that 
extend back to or before the Early-Late Cretaceous boundary (100 Myr 
ago), thus suggesting that their ancestors had probably arrived on 
Indo-Madagascar before its separation from other Gondwanan land- 
masses (thatis, viaa deep-time vicariance event rather than overwater 
dispersal). Despite originating in different ways, we conclude that the 
latest Cretaceous insular vertebrate fauna of Madagascar was probably 
as unique relative to mainland faunas as it is today. 

The currently known latest Cretaceous (Maastrichtian) vertebrate 
fauna of Madagascar, assembled as an island fauna over the course 
of more than 20 Myr (around 88-66 Myr ago), became completely or 
nearly completely extinct—presumably the result of the end-Cretaceous 
bolide impact and/or the penecontemporaneous nearby volcanic erup- 
tions that resulted in the Deccan Traps of India“*. Thereafter, assembly 
of a Madagascan fauna began afresh. The establishment of anew verte- 
brate fauna required the arrival of transoceanic dispersers, primarily 
from Africa, that encountered habitats that were largely to completely 
devoid of Maastrichtian antecedents** “1. 
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Extended Data Fig. 1| Photographs of the skeleton of A. huiholotype medial upper incisor; imp, intermediate manual phalanx; ipp, intermediate 
(UA 9030). a, b, ‘Top’ (a) and ‘bottom’ (b) views, as preserved. Theleftandright pedal phalanx;L, lumbar vertebra; lu, lunate;m, mandible; mc, metacarpal; mt, 
sides are indicated as (I) and (r), respectively. as, astragalus; at, atlas; av, metatarsal; na, navicular; osc, os calcaris; pcl, lower first postcanine tooth; 
anticlinal vertebra; ax, axis; C, cervical vertebra; ca, calcaneus; Ca, caudal PC1, upper first postcanine tooth; pe, pelvis; pfi, parafibula; pi, pisiform; pmp, 
vertebra; cap, capitate; CC, costal cartilage; cl, clavicle; cor, coracoid; cu, proximal manual phalanx; ppp, proximal pedal phalanx; R, rib; ra, radius; sc, 


cuboid; dpp, distal pedal phalanx; ent, entocuneiform; ep, epipubis; fe, femur; scapula; sca, scaphoid; stb, sternebra; T, thoracic vertebra; ti, tibia; tr, 
fi, fibula; ha, hamate; hu, humerus; i, lower incisor; 1D, distal upper incisor;IM, triquetrum; ul, ulna. 
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Extended Data Fig. 2 | Bivariate plots of body mass estimates for A. hui. 

a, Relationship between cranial length and body mass in 423 extant mammals, 
plus estimated body mass in the gondwanatherians A. huiand Vintanasertichi. 
b, Relationship between cranial width and body mass in 423 extant mammals, 
plus the estimated body mass in A. huiand V. sertichi.c, Relationship between 
cranial size and body mass in 423 extant mammals, plus the estimated body 
mass inA. huiand V. sertichi. d, Relationship between humeral length and body 
mass in187 extant therian mammals, plus the estimated body mass in A. hui. 


b Cranial Width and Body Mass ° 
ly 
3-4 
log,, Body Mass = 
3.3359907 x log,, Cranial Width - 5.374578 
244 
g 
Be Unporcsssscsetesessereseeeeeses 
iJ 
= 
ee ee 
fo] 
a 
2 0- 
D 
° 
a 
-14 
-24 
T T : —T T 
1.0 1.5 2.0 2.5 
Log,, Cranial Width (mm) 
7 _ 
Humeral Length and Body Mass 
6-4 log,, Body Mass = 
2.7971229 x log,, Humeral Length - 1.725024 
S54 
3 
iJ 
= 
> 
nol 
a 
247 
D 
° 
a 
3 - 
2 | 
T T T T 
1.5 2.0 2.5 3.0 
Log,, Humeral Length (mm) 
7 - . F 
f Stylopodium Diaphyseal Circumference and Body Mass 
674 log,, Body Mass = 
2.754 x log,, Diaphyseal Circumference - 1.097 
3, 
3 
iJ 
= 
> 
mo] 
fe} 
a 
247 
D 
an Sorcerer eer te 
a 
3-4 
2- 


T T T 
1.0 1.5 2.0 2.5 


Log,, Femoral + Humeral Diaphyseal Circumference (mm) 


e, Relationship between femoral length and body mass in 184 extant species of 
therian mammal, plus the estimated body massin A. hui. f, Relationship 
between stylopodial diaphyseal circumference and body mass as calculated for 
asample of 245 tetrapod species* (data points shown for mammals only, 
n=200), plus the estimated body mass in A. hui. Regression lines in a-eare 
from ordinary least squares regressions, whereas the regression line showninf 
is froma phylogenetic generalized least squares regression. Measurement 
data, methods and references are provided in the Supplementary Information. 
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Extended Data Fig. 3| Cranium of A. huiholotype (UA 9030). alisphenoid; eo, exoccipital; fr, frontal; ID, distal upper incisor; IM, mesial 
a-e, Photographs of external surfaces of cranium in right lateral (a), left lateral upper incisor; ju, jugal; la, lacrimal; mx, maxilla; na, nasal; os/ps, 

(b), dorsal (c), ventral (d) and anterior (e) views. a'-e’, Labelled pCT images of orbitosphenoid/presphenoid complex; PC, upper postcanine tooth; pe, 
cranium inthe same views as ina-e, respectively. f, Labelled CT image of petrosal; pmx, premaxilla; pt, pterygoid; smx, septomaxilla; sq, squamosal; 


medial view of right side of nasal cavity. aC, alveolus for upper canine; as, v, vomer. 
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Extended Data Fig. 4| Inner ear of A. huiholotype (UA 9030). a, Ventral view 
of reconstructed cranium, with petrosal fragment bounded by redline 
enlarged ina’. b-e, Reconstructed cochlear canal in dorsomedial (b), 
ventrolateral (c) and posteroventromedial (d, e) views, with the view in d being 


cochlear ganglion Cc 


canal primary 


a bony 
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slightly more posterior and the view ine slightly more medial. Ine, only the 
medial aspect of cochlear canal in grey is shown, to reveal primary bony lamina 
and cochlear nerve foramina. Semi-transparent grey, cochlear canal; yellow, 
cochlear nerve; blue, secondary canal. 
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Extended Data Fig. 5 | Lowerjaw of A. hui holotype (UA 9030). Photographs of left dentary in left column; photographs of right dentary in right column. 
a,b, Lateral views. c, d, Dorsal (occlusal) views. e, f, Medial views. i, lower incisor; pc, lower postcanine tooth. 
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Extended Data Fig. 6 | See next page for caption. 


Extended Data Fig. 6| Enamel microstructure of A. hui holotype (UA 9030). 
a-d, Scanning electron micrographs of single postcanine tooth enamel 
fragment sectioned in various planes. a, Transverse section of entire enamel 
band from the enamel-dentine junction (EDJ) to the outer enamel surface 
(OES) (about 0.4-mm thick) showing single layer of radial enamel and absence 
of distinct layer of prismless external enamel. Prism size increases from, on 
average, 2.3 to 2.8 pm from the enamel-dentine junction to the outer enamel 
surface. Prisms close to the enamel-dentine junction are intersected by 
interprismatic matrix at slightly higher angles than towards the outer enamel 
surface. b, Transverse section showing the clear distinction between enamel 


prisms and interprismatic matrix. c, Radial section showing radial enamel in 
outer zone with prisms surrounded by interprismatic matrix and some 
cross-sections of prisms showing tubules. d, Radial, but slightly oblique, 
section showing enamel of inner zone with prisms enveloped by interprismatic 
matrix and presence of odontoblastic processes. In this zone, crystallites of 
interprismatic matrix lie almost perpendicular to those of prisms. Prisms rise 
from the enamel-dentine junction at angle of about 45°; this angle is reduced 
only slightly towards the outer enamel surface. IPM, interprismatic matrix; od, 
odontoblastic process; p, prism; tu, tubule. 
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Extended Data Fig. 7 | Selected individual vertebrae of A. huiholotype column during preparation and has not been CT scanned. Dotted outlines 
(UA 9030). Thoracic (T6 and T16), lumbar (LlandL11) and anterior caudal(Ca8) —_ represent the shape of preserved left transverse process, and the mirrored 
vertebrae are depicted in anterior, dorsal, left lateral and ventral views. Theleft | reconstructed right transverse process. 

transverse process of L11lis preserved but was separated from the vertebral 
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Extended Data Fig. 8| Limb bone elements of A. hui holotype (UA 9030). 
a-p, LCT images. a, b, Left humerus in anterior (a) and posterior (b) views. 
c,d, Left ulna in anterior (c) and lateral (d) views. e, f, Left radius in anterior (e) 
and lateral (f) views. g, h, Left manus in dorsal (g) and palmar (= ventral) (h) 


calcaneus 


entocuneiform 


views. i,j, Left femur in anterior (i) and posterior (j) views. k, I, Left tibiain 
anterior (k) and lateral (I) views. m,n, Left fibula in anterior (m) and lateral (n) 
views. 0, p, Left pes in dorsal (0) and plantar (=ventral) (p) views. 
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Extended Data Fig. 9 | Pectoral and pelvic girdle elements of A. hui holotype (UA 9030). ad, .CT images. a, b, Left scapulacoracoid, left and right clavicle and 
manubrium in ‘top’ (a) and ‘bottom’ (b) views (as preserved). c, d, Left os coxa and epipubic bone in lateral (c) and medial (d) views. 
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Extended Data Fig. 10| Phylogenetic relationships of A. huiand selected Euharamiyida, Gondwanatheria (including Adalatherium) and 
mammaliaforms. Strict consensus tree of 16 equally parsimonious trees (tree Multituberculata—is highlighted in blue. Taxon and character lists, the data 
length =2,315, consistency index = 0.3015 and retention index = 0.7001) matrix, limitations and assumptions, phylogenetic methods anda more 
derived from analysis of 84 cynodont taxa and 530 characters, with multistate detailed explanation of the results are provided in the Supplementary 
characters unordered and unweighted. Bremer values are listed next to the Information. 
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The statistical test(s) used AND whether they are one- or two-sided 
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Data collection Avizo 7 (VSG), 8 (FEI), and 9 (FEI/Thermo-Fisher Scientific); Amira 6 (FEI/Thermo-Fisher Scientific - measurement tool, spline probe tool); 
Dragonfly 3.0; Animation Producer in Avizo; Adobe Premiere Pro (Creative Cloud edition); Geomagic Wrap (MeshDoctor and Relax 
functions); Autodesk 3Ds Max 


Data analysis TNT version 1.1; PAUP* 4.0; JAS version 14 (SAS Institute); Avizo 7 (VSG), 8 (FEI), and 9 (FEI/Thermo-Fisher Scientific); Amira 6 (FEI/ 
Thermo-Fisher Scientific - measurement tool, spline probe tool); Dragonfly 3.0; Animation Producer in Avizo; Adobe Premiere Pro 
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- Adescription of any restrictions on data availability 


The holotypic specimen of Adalatherium hui is reposited in the University of Antananarivo (UA), Madagascar. The data matrix for the phylogenetic analysis has been 
deposited in MorphoBank (http://morphobank.org/permalink/?P3411). The Life Science Identifiers (LSID) for the new family, genus, and species are registered with 
Zoobank (http://zoobank.org). 
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Life sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Sample size Our study entails the description of a new fossil taxon represented by only a single specimen. Comparisons were made with many other taxa 
from around the world that are documented in detail in the Supplementary Information section. Comparisons with extant mammals were 
conducted in the large, diverse collections of the Department of Zoology, Denver Museum of Nature & Science and the Section of Mammals, 
Carnegie Museum of Natural History, Pittsburgh, USA. 


Data exclusions We sampled extinct taxa as broadly as possible, but were limited by the availability to study specimens of some taxa firsthand. In these 
instances, we used casts, 3D prints, CT scan datasets, and photographs. 


Replication Our taxon-character matrix is a derivative of that employed by Krause et al. (2014 — Nature) with improvements that are explicitly 
documented so as to allow replication. 


Randomization Our study only reports a single specimen; as such, no randomizations were possible. 


Blinding Character matrices for phylogenetic analysis were developed on the basis of independent observation of each taxon. 


Reporting for specific materials, systems and methods 


We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, 
system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. 


Materials & experimental systems Methods 
n/a | Involved in the study n/a | Involved in the study 
Antibodies ChIP-seq 
Eukaryotic cell lines Flow cytometry 
Palaeontology MRI-based neuroimaging 
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Clinical data 


Palaeontology 


Specimen provenance The holotypic specimen was collected from the Upper Cretaceous Maevarano Formation, Mahajanga Basin, Madagascar. The 
specimen was collected under a Collaborative Agreement with the University of Antananarivo and various ministries of the 
Madagascar government. 


Specimen deposition The University of Antananarivo (UA), Madagascar 


Dating methods o new dates were obtained for this study. 
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Anat Arzi'?*™, Liron Rozenkrantz2“, Lior Gorodisky”*, Danit Rozenkrantz‘, Yael Holtzman”*, 
Aharon Ravia2’, Tristan A. Bekinschtein', Tatyana Galperin’, Ben-Zion Krimchansky’, 
Gal Cohen’, Anna Oksamitni’, Elena Aidinoff*, Yaron Sacher*>* & Noam Sobel?**™ 


After severe brain injury, it can be difficult to determine the state of consciousness of a 
patient, to determine whether the patient is unresponsive or perhaps minimally 
conscious’, and to predict whether they will recover. These diagnoses and prognoses 
are crucial, as they determine therapeutic strategies such as pain management, and 
can underlie end-of-life decisions”*. Nevertheless, there is an error rate of up to 40% in 
determining the state of consciousness in patients with brain injuries*». Olfaction 
relies on brain structures that are involved in the basic mechanisms of arousal®, and 
we therefore hypothesized that it may serve as a biomarker for consciousness’. Here 
we use a non-verbal non-task-dependent measure knownas the sniff response® "to 
determine consciousness in patients with brain injuries. By measuring 
odorant-dependent sniffing, we gain a sensitive measure of olfactory function’? . We 
measured the sniff response repeatedly over time in patients with severe brain injuries 
and found that sniff responses significantly discriminated between unresponsive and 
minimally conscious states at the group level. Notably, at the single-patient level, ifan 
unresponsive patient had a sniff response, this assured future regaining of 
consciousness. In addition, olfactory sniff responses were associated with long-term 


survival rates. These results highlight the importance of olfaction in human brain 
function, and provide an accessible tool that signals consciousness and recovery in 
patients with brain injuries. 


Sniff responses can be sensory-driven, cognitively driven, or both. 
Sensory-driven sniff responses reflect automatic odorant-driven modi- 
fications in nasal airflow, which are evident in humans®"°” and other 
animals*"®. Sensory-driven sniff responses have two levels: level 1, odor- 
ant detection, which reflects the changes in nasal airflow in response 
to the presence of an odorant; and level 2, odorant discrimination, 
whichis the differential response to different odorants, such as reduced 
nasal airflow when sensing an unpleasant compared with a pleasant 
odorant®””*”, Cognitively driven sniff responses reflect situational 
understanding and/or learning. For example, patients in this study were 
told that they would be presented with odorants. If a patient then modi- 
fies their nasal airflow in response to an empty jar (used as the ‘blank’) 
when it was presented beneath their nose, this implies the possible 
awareness of the jar and/or possible learned anticipation of an odorant. 

We used pleasant (shampoo) and unpleasant (rotten fish) odor- 
ants (which were valence-rated independently”’) to trigger sen- 
sory sniff responses, and blank presentations to trigger cognitive 
sniff responses (Fig. 1a), in 43 patients with disorders of conscious- 
ness (DOC) (Fig. 1, Extended Data Table 1 and Methods). Directly 
after each olfactory testing session, the state of consciousness of the 
patient was assessed using standard clinical measures” to deter- 
mine whether the patient had unresponsive wakefulness syndrome 


(UWS) (which has also been referred to as a vegetative state)—in which 
the patient showed no signs of consciousness!—or was in a minimally 
conscious state (MCS), in which the patient showed inconsistent but 
reproducible evidence of consciousness’. A total of 146 sessions were 
conducted (1-12 sessions per patient; mean + s.d. = 3.4 +3 sessions; 
Extended Data Table 1) with inter-session intervals of between 1 and 
10 weeks (mean +s.d.=2.65 +1.7), enabling longitudinal comparisons 
for 31 patients. Overall, 73 sessions were conducted for 31 patients in 
a MCS and 73 sessions for 24 patients with UWS (16 of the 24 patients 
transitioned from UWS to MCS during the study). 

We first focused on sensory-driven sniff responses. We compared 
the normalized nasal inhalation volume (Fig. 1b and Methods) of the 
first three nasal inhalations after the presentation of unpleasant and 
pleasant odorants compared with the respiratory baseline during MCS 
and UWS sessions. Given the abnormal distribution of the data for both 
MCS and UWS (Shapiro-Wilk tests; first sniff, W/> 0.88 for all tests, 
P<0.001 for all tests; second sniff, W> 0.66 for all tests, P< 0.01 for 
all tests; third sniff, W> 0.63 for all tests, P< 0.04 for all tests) and the 
greater variance in MCS than in UWS (Leven’s tests; first sniff, F,447=3.9, 
P=0.05; second sniff, F,447=3.3, P= 0.07; third sniff, F, ,.,=1.6, P=0.21), 
we used nonparametric tests with Bonferroni correction for multiple 
comparisons. At the group level, we observed that whereas the nasal 
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Neurobiology, Weizmann Institute of Science, Rehovot, Israel. “Loewenstein Hospital Rehabilitation Center, Raanana, Israel. °Sackler Medical Faculty, Tel-Aviv University, Tel Aviv, Israel. "These 
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Fig. 1| Measuring sniff responses in patients with DOC. a, Experimental 
design for the olfactory test. Pleasant odorants (purple), unpleasant odorants 
(blue) and blank (grey) were presented inarandom order, around 10 times 
each, fora duration of approximately Ss. b, A trace of nasal respiration during a 
single trial, during which three baseline respirations and three sniffs after the 
presentation of the unpleasant odorant were recorded using a nasal cannula 
connected toa spirometer and amplifier. The dashed line denotes the odorant 
onset and the blue bar represents the duration of the unpleasant odorant. This 


inhalation volume was significantly reduced in response to odorantsin 
MCS sessions, it was not influenced by odorants in UWS sessions. More 
specifically, at the first sniffafter odorant presentation, the normalized 
sniff volume during MCS sessions decreased from the baseline (which 
was set to 1) to 0.89 + 0.18 normalized flow units (NFU) (mean + s.d.) 
in response to a pleasant odorant (median = 0.92 NFU, Z = 5.2, 
P<0.0001, 95% confidence interval = 0.85-0.93, effect-size r= 0.61; 
Fig. 2a) and to 0.88 + 0.2 NFU in response to an unpleasant odorant 
(median = 0.94 NFU, Z=4.7, P< 0.0001, 95% confidence interval =0.83- 
0.93, effect-size r= 0.54; Fig. 2b). This reflects a reduction of about 10% 
innasal airflow that accounts for odorant content in MCS sessions. We 
did not, however, observe a significant difference between pleasant 
and unpleasant odorants (Z= 0.13, P= 0.90, effect-size r= 0.01; Fig. 2a, 
b).In other words, patients showed level-1 (odorant detection) but not 
level-2 (odorant discrimination) sensory-driven sniff responses dur- 
ing MCS sessions. By contrast, we observed no group-level responses 
during UWS sessions (normalized nasal inhalation volume, pleasant, 
mean +s.d. = 0.97 + 0.13 NFU, 95% confidence interval = 0.94-0.999, 
median = 0.99 NFU; unpleasant, mean + s.d. = 0.97 + 0.12 NFU, 95% 
confidence interval = 0.85—0.997, median = 0.98 NFU, difference from 
baseline, Z<2.3 for all tests, P> 0.05 for all Bonferroni-corrected tests, 
effect-size r< 0.27 for all tests; Fig. 2a, b). Furthermore, the responses 
during UWS sessions were indeed significantly different from those 
during MCS sessions (difference between groups for the pleasant odor- 
ant, Z=3.3, P= 0.0009, effect-size r= 0.39; and unpleasant odorant, 
Z=2.8, P=0.005, effect-size r= 0.33; Fig. 2a, b). In other words, the 
level-1 sensory-driven sniff responses that were evident during MCS 
sessions significantly differentiated these sessions as a group from 
UWS sessions. Similar results were found for the second sniff after 
odorant presentation, but not for the third sniff, indicating a genuine, 
transient, odorant-driven response and nota change in state (Fig. 2e, f 
and Extended Data Fig. 1). We replicated this analysis, comparing the 
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trace represents data from patient 37 and is not an illustration. c, Example 
traces lacking sniff responses from patient 28 with UWS. Normalized sniff 
volumes (mean +s.d.): pleasant, 1.00 + 0.08 (11 repetitions); unpleasant, 
0.94+0.1(12 repetitions); blank, 1.00 + 0.09 (11 repetitions). d, Example traces 
of intact sniff responses from patient 37 ina MCS. Normalized sniff volumes: 
pleasant, 0.68 + 0.22 (7 repetitions); unpleasant, 0.56 + 0.19 (7 repetitions); 
blank, 0.84 + 0.21 (6 repetitions). Traces are shown as mean (line) and s.e.m. 
(shaded areas). 


odorant-driven sniff response to the first inhalation after the presenta- 
tion of the blank rather than to the respiratory baseline, and obtained 
similar results (Extended Data Fig. 1and Supplementary Information). 
Inthree controls, we verified that these effects were nota reflection of 
odorant trigeminality (thatis, stinging odours that activate the trigemi- 
nal nerve) (Extended Data Fig. 2 and Supplementary Information), 
tracheostomy of the patient (Extended Data Fig. 3 and Supplementary 
Information) or the pattern of brain injury (Extended Data Table 2). 
Together, these results indicate that sensory sniff responses occur in 
patients ina MCS but not in patients with UWS. 

Because consciousness fluctuates in DOC’, it is common to analyse 
the data per session and not per patient” ~’. A case in point is patient 
4, who started the study with a session in MCS, then deteriorated, 
conducting his subsequent session in UWS, only to later recover and 
conduct a third and final session in MCS (at the time of publication, 
patient 4 walks and talks). If we were to average the three sessions of 
patient 4, this would obscure any differences between MCS and UWS. 
In turn, treating these sessions independently risks overlooking the 
possible interdependence of measurements obtained from the same 
individual (albeit in different states). To address interdependence, we 
reanalysed the data considering only the strongest sniff-response ses- 
sion from each participant who remained unchanged throughout the 
study. This analysis retained 19 patients in a MCS and 8 patients with 
UWS. We again observed a significant odorant-induced sniff response 
in patients in a MCS (sniff 1, pleasant, mean + s.d. = 0.831 + 0.18 NFU, 
95% confidence interval = 0.75-0.91, median = 0.89 NFU; unpleasant, 
mean +s.d. = 0.829 + 0.21 NFU, 95% confidence interval = 0.73-0.92, 
median = 0.93 NFU; difference from baseline, Z > 2.9 for all tests, 
P<0.004 for all Bonferroni-corrected tests, effect-size r> 0.66 for all 
tests), yet no sniff response was observed in patients with UWS (sniff 1, 
pleasant, mean +s.d.=0.97 + 0.11 NFU, 95% confidence interval = 0.90- 
1.04, median = 0.92 NFU; unpleasant, mean + s.d. = 0.94 + 0.06 NFU, 
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Fig. 2 |The sniff response reflects the level of consciousness in patients 
with DOC. a-c, Normalized sniff volume after a pleasant odorant (a; purple), 
unpleasant odorant (b; blue) or the blank (c; grey), during UWS sessions (white 
fill, left trace and graph; n=73) and MCS sessions (colour fill, right trace and 
graph; n=73) of the first sniff after stimulus delivery. d, Combined datafroma 
andb.e, f, Combined data fromaandb for the second (e) and third (f) sniff after 
odorant delivery (see Extended Data Fig. 1 for separate graphs of pleasant and 
unpleasant odorants). Left, each dot represents a session, the flat violin plots 
show the distribution, the red lines denote the median and the dashed 


95% confidence interval = 0.90-0.98, median = 0.96 NFU, difference 
from baseline, P> 0.05 for all Bonferroni-corrected tests). Moreover, 
patients in a MCS were not only significantly different from baseline, 
but also significantly different from the average UWS value (sniff 1 
across odorants, Z=2.2, P= 0.03, effect-size r= 0.50). Thus, sensory 
sniff responses were evident not only during MCS compared with UWS 
sessions, but also in patients ina MCS compared with patients with UWS. 

We next focused on cognitively driven sniff responses. At the group 
level, we observed that whereas nasal inhalation volume was signifi- 
cantly reduced in response to blank presentation in MCS sessions, it was 
uninfluenced by blank presentation in UWS sessions. More specifically, 
during the first sniff after blank presentation, the normalized sniff vol- 
ume in MCS sessions decreased from the baseline (which was set to 1) 
to 0.955 + 0.13 NFU (mean +s.d.; 95% confidence interval = 0.93-0.99, 
median =0.96 NFU, Z=3.25, P=0.001, effect-size r= 0.38; Fig. 2c). This 
reflects a reduction of about 5% in nasal airflow. By contrast, we observed 
no response in UWS sessions (blank, mean +s.d.=1.00 + 0.15 NFU, 95% 
confidence interval = 0.97-1.03, median =1.00 NFU, Z= 0.08, P=0.94, 
effect-size r= 0.009; Fig. 2c). Furthermore, the responses during UWS 
sessions were significantly different from the MCS sessions (difference 
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UWS MCS © 
SH 


horizontal lines denote the baseline value of 1 NFU. Right, data are mean+s.e.m 
for each distribution. The Pvalues beneath the distributions denote the 
difference froma baseline inhalation—that is, the existence of a sniff response. 
The Pvalues above the distributions denote the difference in sniff response 
across groups. Pvalues were calculated using two-tailed Wilcoxon signed-rank 
tests for within-group comparisons and Wilcoxon rank-sum tests for 
between-group comparisons corrected for multiple comparisons. Corrected 
Pvalues are indicated by an asterisk (*) and uncorrected Pvalues are indicated 
by ahash (#) symbol (see Methods). 


between groups, Z=2.31, P=0.02, effect-size r= 0.27; Fig. 2c). In other 
words, the cognitively driven sniff responses that were evident during 
MCS sessions significantly differentiated these sessions as a group 
from UWS sessions. This outcome was not evident for the second and 
third sniffs (Extended Data Fig. 1 and Supplementary Information), 
again indicating that this difference between groups was a genuine, 
transient, task-driven response and not achangein state. Insummary, 
we conclude that similar to the level-1 sensory-driven component of the 
sniff response, the cognitively driven component of the sniff response 
also reflects the state of consciousness of patients with DOC at the 
session level of the group. 

For this analysis to have value not only for basic science but also for 
clinical use, it would need to be informative at the level of an individual 
patient. To make single-patient rather than group judgements, we 
applied a sniff-response threshold that reflects the extent of changein 
nasal airflow that constitutes a sniff response within an individual. On 
the basis of previous studies in healthy participants” (Supplementary 
Table 1), we set a threshold for both the sensory-driven level-1 sniff 
response (odorant detection) and the cognitive sniff response to a 
change of more than 15% in the normalized sniff volume between the 
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Fig. 3 |The sniff response is associated with the recovery of consciousness 
in patients with DOC. The red lines denotea previously published 
sniff-response threshold" (more than 15% change in magnitude and/or 0.35 
s.d.). Dots within the boxed area (white background; bottom right) reflect 
sessions without a sniff response; dots outside the boxed area (shaded 
background) reflect sessions witha sniff response. a-c, Each dotis a UWS 
session (n= 73). Unfilled dots represent sessions of patients who recovered 


event and baseline respiration and/or a modulation in sniff volume that 
reflects a shift in the s.d. of more than 0.35 across trials. The threshold 
for a sensory-driven level-2 sniff response (odorant discrimination) 
was Set to a change of more than 20% in the normalized sniff volume 
between pleasant and unpleasant odorants. Using these criteria, we 
observed that 20 out of 31 patients ina MCS had sniff responses during 
at least 1 session (19 patients with level-1 sensory-driven sniff responses, 
of whom 13 had cognitively driven sniff responses, 4 had level-2 
sensory-driven sniff responses and 1 had only a strong trend to level-1 
sensory-driven sniff responses yet a significant cognitively driven sniff 
response). This indicates that this measure has a sensitivity of 64.5% 
to determine a MCS (Extended Data Fig. 4). The false-negative rate 
reflects the fact that 35.5% of patients ina MCS had no sniff response, 
indicating that a lack of sniff responses does not necessarily indicate 
unconsciousness. 

We next analysed the patients with UWS. We observed that 9 out of 
24 patients with UWS nevertheless had a sensory-driven level-1 sniff 
response in at least 1 session. Of these, one patient also had a level-2 
sniff response. Cognitively driven sniff responses were observed in nine 
patients with UWS, eight of whom had asensory-driven sniff response 
(the ninth patient showed only a trend). Taken together, whereas we 
failed to observe a sniff response in UWS sessions at the group level, 10 
out of 24 individual patients with UWS had sniff responses in at least 
1session. To investigate the implications of the sniff responses that we 
observed in the patients with UWS, we compared these results to the 
subsequent clinical progression of the patient over time. Remarkably, 
we observed that all 10 patients with UWS who had a sniff response 
in 1 session or more later transitioned to MCS. Thus, a sniff response 
in UWS indicated transition to MCS with 100% specificity and 62.5% 
sensitivity (10 out of 16 patients with UWS who transitioned) (Fig. 3a-d 
and Extended Data Fig. 5). Furthermore, in 4 of the patients the sniff 
response preceded any other sign of consciousness recovery by days to 
months (2.5 months, around 2 months, about 1.5 months, and 2 days). 
These results were not just dependent on the sniff threshold that we 
used (Extended Data Figs. 6, 7), were not dependent on the odorant 
condition (Extended Data Fig. 8) and did not reflect the duration of 
participation (Supplementary Information). Together, this suggests 
that sniff responses are informative for the prognosis of a patient at 
the single-patient level. 

Lastly, we tested whether sniff responses are informative of the 
long-term outcome for DOC. We conducted a follow-up investigation 
atacommonpointintime about 5 years after injury of the first partici- 
pant (mean +s.d. across patients =37.4 + 14.7 (range 17-64) months after 
injury). We observed that whereas patients who had a sniff response 
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shortly after injury generally survived for years, patients who did not 
have a sniff response after injury generally did not survive during this 
period. More specifically, only 2 out of 24 patients (8.3%) who had a sniff 
response after their injury did not survive (these patients died 5 and 
7 months after the injury). The remaining 22 out of 24 patients (91.7%) 
with a sniff response after injury survived (current mean +s.d.=37.3 + 
14.1months after injury). By contrast, 12 out of 19 patients (63.2%) who 
did not havea sniff response after injury did not survive (patients died 
within 17.5 +12.2 months after injury, median =13 months; Fig. 4 a-d and 
Extended Data Fig. 5). Thus, the sensitivity of the sniff response in pre- 
dicting survival at 37.3 + 14.1 months after brain injury is 91.7% (y?=14.5, 
P=0.0001, effect-size Cramer’s V= 0.45). Given the strong association 
between sniff threshold and survival, we would like to reiterate that the 
sniff-threshold values were set before we obtained the survival rate 
follow-up data (Supplementary Table 1) and, indeed, this threshold 
was not the best threshold that could be applied (Extended Data Figs. 6, 
7).In addition, we also assessed the functional independence of the 29 
surviving patients using the functional independence measure”. This 
measure was independently obtained during clinical assessments of 
these patients that were conducted at 20.4 +11.5 months (mean +s.d.) 
after injury. We observed a significant correlation in which the extent of 
the sensory—but not cognitive—sniff response predicted the later level 
of independence in patients with UWS (pleasant, Spearman’s ) =—0.49, 
P=0.001; unpleasant, Spearman’s r;,=—0.60, P< 0.0001; blank, Spear- 
man’s r,,=—0.20, P= 0.21 (note that ther values are negative, because 
a larger sniff response is reflected in a lower post-stimulus volume); 
Fig. 4e-g and Extended Data Fig. 5). Furthermore, we noticed that this 
effect was carried by 11 out of 12 surviving patients with UWS who later 
transitioned to MCS. However, in patients in a MCS, no association was 
observed (pleasant, Spearman’s r,, = 0.30; unpleasant, Spearman’s 
Ie. = 0.24; blank, r,, = 0.25, P> 0.05 for all Bonferroni-corrected tests). 
This dissociation between MCS and UWS (difference between cor- 
relations, Fisher’s exact test, pleasant, Z= 4.19, P< 0.001; unpleasant, 
Z=4.64, P< 0.001; blank, Z= 2.27, P=0.012) inthe long-term predictive 
value of the measurements of functional independence suggests that 
the sniff response may be informative of the basic mechanisms of life 
and consciousness, but may not be equally informative for functional- 
ity beyond this basic level. Together, these findings suggest that the 
sniff response can be used as an accessible diagnostic and prognostic 
tool for the level of consciousness and survival in patients with DOC 
at the single-patient level. 

Olfaction is a primal sensory process in mammals” that directly 
targets the limbic brain without a relay through the thalamus’. In 
humans, the neuroanatomy of the olfactory system is combined with 
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Fig. 4| The sniff response is associated with long-term survival and 
functional recovery in patients with DOC. The red lines denote the 
sniff-response threshold (more than 15% change in magnitude and/or 0.35s.d.). 
Dots within the boxed area (bottom right) reflect sessions without a sniff 
response; dots outside the boxed area lines reflect sessions witha sniff 
response. a-c, Each dotis aDOC session (both MCS and UWS; n=146); filled 
black dots represent sessions of patients who die during the study (deceased) 
and coloured dots represent sessions of surviving patients (alive; survival 
(mean +s.d.),37.3+14.1months after brain injury). a, Pleasant odorant. 


aspects of olfactory phenomenology and neurodynamics to afford 
olfaction a unique position in consciousness’, Olfactory informa- 
tion is processed at several levels and had we relied on higher-order 
olfactory processing, we would probably not have seen a response 
in patients with DOC”. The olfactory sniff response, however, is a 
very basic olfactory mechanism that is persistent across species””*. In 
healthy humans, the olfactory sniff response can persist even without 
conscious awareness in both wake“ and sleep (it is strongest during 
rapid eye movement sleep)””*”°. Nevertheless, such sniff responses rely 
on intact olfactory neuroanatomy. In this study, we do not claim that 
this neuroanatomy is the location of consciousness itself, but rather 
that it probably indicates a level of corticothalamic integrity that is 
important for consciousness*"” and, indeed, for life itself. There are 
precedents for powerful biomarkers of consciousness in patients with 
DOC”, yet the sniff response is also predictive of survival three or more 
years after injury. The first and last thing that we do in life is to inhale 
and such inhalations can be modulated by smell, and this modulation 
is connected with the most basic processes of life. 

In addition to the neurological mechanisms of consciousness, our 
results have clinical implications. Improvements in emergency medicine 
have increased the survival rates after brain injury but, paradoxically, 
have resulted in increasing numbers of survivors living with DOC****. As 
previously reported*’, the rate of misdiagnosis in these patients could 
be as high as 40%. The sniff response can provide an accessible and 
easy-to-use tool that may considerably improve this outcome. The ease 
of use of this tool is an important feature, as it separates this approach 
from neuroimaging®*” and electrophysiology**”’, which—although 
powerful for the assessment of consciousness—are not always available 
to patients with DOC, particularly in developing countries. 

The sniff response hada specificity of 100% for the recovery of con- 
sciousness. All patients with a sniff response ultimately showed signs 
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b, Unpleasant odorant.c, Blank. d, Percentage of patients with DOC with sniff 
responses who survived (white, 91.7%) or are deceased (D; red, 8.3%) and 
patients with DOC without sniff responses who survived (white, 36.8%) or are 
deceased (red, 63.2%). e-g, Relation between the functional independence 
measure (FIM) and normalized sniff volume. Each dot isa UWS sessionina 
surviving patient (n=41). e, Pleasant odorant. f, Unpleasant odorant. g, Blank. 
Spearman’s correlation coefficients (r) are indicated. Pvalues were calculated 
using Spearman correlations. 


of consciousness and all patients with UWS who remained unconscious 
did not have a sniff response. This places the sniff response among 
the most-suitable analyses for the estimation of the recovery of con- 
sciousness in patients with DOC****°, In turn, the sensitivity of the 
sniff response to detect consciousness in patients ina MCS was 64.5%, 
and for the detection of the transition from UWS to MCS was 62.5%. 
These sensitivity rates are higher than most active command-following 
tests and are similar to other passive paradigms and resting-state stud- 
ies****°, This sensitivity rate, however, suggests that around 35% of 
conscious patients with DOC had no sniff response. The absence of a 
sniff response in conscious patients with DOC may reflect a chronic or 
transient impairment of the olfactory system that is possibly related 
to their brain injury“. Indeed, analysis of the structural brain-imaging 
data revealed that seven of the eight patients who had at least one 
session in MCS, but did not have a sniff response, had damage to 
olfaction-related brain structures (Extended Data Table 2). The absence 
of asniff response in conscious patients with DOC may also reflect an 
inability to execute the precisely timed motor act of sniffing despite 
intact olfaction*”. Ona related note, non-olfactory volitional nasal 
inhalations have been used to assess the conscious brain of paralysed 
individuals in the context of providing a means of communication or 
as amethod for device control**. Moreover, non-olfactory volitional 
nasal inhalation was also tested as a potential method for the detection 
of consciousness in individuals with DOC; however, most patients with 
DOC did not inhale on command”. Nevertheless, one patient ina MCS 
ina previous study* was able to inhale on command, but could not 
initiate any other motor movement, further indicating the potential 
use of this motor response in some cases for which other verifications 
of response are unavailable. Taken together, we conclude that patients 
with DOC may not inhale on command*, but can sniff in response to 
the presentation of an odorant. This dissociation between volitional 


and odorant-induced sniffing further indicates the fundamental role 
of the olfactory brain in basic mechanisms of arousal. 

Finally, in addition to a couple of technical limitations, which are 
described in the Supplementary Information, we acknowledge a con- 
ceptual limitation that is common to studies such as ours. We claim 
that sniff responses in UWS predicted the later recovery of conscious- 
ness. Analternative is that these patients were already conscious, but 
had been misdiagnosed by the existing standard clinical assessments. 
The only way to unequivocally settle between these alternatives is by 
self-reporting of the patients who later transitioned into consciousness. 
If these patients provide recollections from when they were deemed 
unconscious, this implies misdiagnosis. Indeed, such instances reveal 
that insome cases in which all available methods, including functional 
magnetic resonance imaging and electroencephalograms, determined 
a lack of conscious awareness, patients nevertheless later recalled 
events that occurred during their erroneously assessed unconscious 
state**“”, Thus, this fundamental question remains unanswered and is 
perhaps impossible to answer with the currently available methods. 
We should emphasize, however, that this does not detract from the 
contribution of our findings. Whether a sniff response in patients with 
UWS reflects a better diagnosis or unique prognosis, in both cases it 
constitutes the signalling of consciousness with all of the associated 
medical and ethical implications. We conclude that the olfactory sniff 
response reflects the state of consciousness and is associated with 
recovery and long-term survival of patients with DOC. 
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Methods 


Patients 

In total, 50 patients (age, 43.4 +17 years (mean + s.d.; 9 women) were 
recruited over approximately 4 years at the Intensive Care and Reha- 
bilitation of Consciousness Department at the Loewenstein Hospital 
Rehabilitation Center, Raanana, Israel. Patients were tested during mul- 
tiple sessions (range, 1-13 sessions; mean +s.d., 3.8 + 2.98 sessions; total, 
190 sessions) separated by days to weeks depending on their clinical 
and personal availability. Out of 190 olfactory testing sessions, 41ses- 
sions were excluded due to a lack of stable nasal respiration and 3 ses- 
sions were excluded due to a nasal inhalation volume that was larger 
than 3.5s.d. of the group mean (see ‘Inclusion and exclusion criteria’), 
retaining 146 sessions from 43 patients (Extended Data Table 1). The 
study was approved by the ethics committee of Loewenstein Hospital 
Rehabilitation Center. Written informed consent was obtained from 
the legal guardian of the patient. 


Measurement of nasal airflow 

We used methods we have applied extensively in previous studies 
Inbrief, during the olfactory testing sessions, the nasal airflow of patients 
was measured using a nasal cannula (1103, Teleflex Medical) linked 
directly to aspirometer (ML141, AD Instruments; H,O resolution, 15.6 pV) 
that converted airflow into a voltage, which was sent to an instrumen- 
tation amplifier (PowerLab 16SP Monitoring System, AD Instruments) 
sampling at 1,000 Hz using LabChart software (AD Instruments). 


11,13,17,43,44 


Odorants 

We used two odorant mixtures (a pleasant ‘shampoo’ and unpleasant 
‘rotten fish’ scent, both from Sensale) that were presented to all patients, 
and two pure odorant molecules (the pleasant phenylethyl alcohol 
(PEA), CAS 102-20-5, which smells like roses; and the slightly unpleas- 
ant decanoic acid, CAS 334-48-5, which smells like crayons; both from 
Sigma-Aldrich) that were presented to a subset of 31 patients (Extended 
Data Fig. 2). These odorants were effective in generating sniff responses 
in previous studies*”*"®, The odorants were absorbed ina cotton pad 
and placed ina sniff jar. Ajar with only acotton pad served as the blank. 


Procedure 

At the beginning of each session the experimenter explained to the 
patient that odorants will be presented using sniff jars, and that nasal 
respiration will be monitored during the session. This was repeated for 
each session despite the researcher having no indication whether the 
patient heard or understood what was said. Next, the experimenter 
gently applied a nasal cannula to the patient’s nostrils to record nasal 
respiration. If no nasal respiration was observed, the experimenter 
waited for a few minutes before another observation attempt was 
made. If there were still no signs of nasal respiration, the experimenter 
changed the body position of the patient if possible. If no nasal respira- 
tion was observed after the change in position, the olfactory testing 
session was terminated. Lack of nasal respiration ina given session did 
not exclude the patient from future sessions. If nasal respiration was 
detected, the olfactory testing session began. During each trial ajar with 
a pleasant odorant, an unpleasant odorant or the blank was presented 
to the patient for approximately 5 s. The jar was brought into the field 
of view of the patient and was then placed under the nose of the patient 
(without touching the patient) at the exact end of an exhale so that the 
patient would receive the odorant in the following inhale. Each odorant 
and the blank jar were presented around 10 times in a random order 
as long as nasal respiration was evident. In rare cases, due to clinical 
needs or the lack of stable nasal respiration, the session ended before 
the administration of all10 repetitions. Following each olfactory test- 
ing session, we obtained a behavioural clinical evaluation of the state 
of consciousness of the patient to determine whether the session was 
in UWS or MCS (including emergence from MCS). 


Behavioural clinical evaluation 

After each olfactory testing session, the state of consciousness of the 
patient was evaluated using the Coma Recovery Scale Revised (CRS-R)” 
and/or the Coma-Near Coma (CNC) scale’. The CRS-R evaluates the 
presence or absence of responses to auditory, visual, motor, oromotor, 
communication and arousal functions. CRS-R is both a quantitative 
assessment—with scores ranging from 0 (lowest level of consciousness) 
to 23 (highest level of consciousness)—and a qualitative assessment— 
with 4 levels comprising coma, UWS, MCS and emergence from MCS 
for which specific behaviours define each level. The CNC evaluates the 
occurrence of responses to visual, auditory, command following, threat 
response, olfactory, tactile, pain and vocalization. The CNC is botha 
quantitative assessment—with scores ranging from 4 (lowest level of 
consciousness) to 0 (highest level of consciousness)—and a qualitative 
assessment—with 5 levels comprising extreme coma (3.5-4), marked 
coma (2.9-3.49), moderate coma (2.01-2.89), near coma (0.9-2), no 
coma (0-0.89). We converted the CNC qualitative levels, on the basis 
of aprevious study””, as follows: extreme coma and marked coma, UWS; 
moderate coma and near coma, MCS; nocoma, emergence from MCS. In 
addition, all patients were periodically (unrelated to our testing sched- 
ule) assessed using the Loewenstein Communication Scale (LCS)**. The 
LCS evaluates five hierarchical functions: mobility, respiration, visual 
responsiveness, auditory comprehension and linguistic skills (verbal or 
alternative). The LCS is quantitative with scores ranging from 0 to100, 
for which scores of up to 20 are considered UWS and scores above 20 
are considered MCS. Evaluations of the state of consciousness obtained 
directly after the olfactory testing session were missing in three cases 
owing to technical errors and were estimated based on the nearest 
LCS (all three patients were in a MCS). Follow-up functional independ- 
ence of the patients was evaluated using the functional independence 
measure”. The functional independence measure is quantitative with 
scores ranging from 18 to 126. 

We evaluated 21 patients using only CNC and LCS, the state of con- 
sciousness of the remaining patients was evaluated using CRS-R, CNC 
and LCS. Inthe cases in which both CRS-R and CNC were used, the level 
of consciousness was determined by the CRS-R®. To estimate the effect 
ofhaving only CNC andLCS estimates, we compared the classification of 
UWS and MCS between CRS-Rand CNC scalesinthe current dataset. We 
observed disagreement in 30 out of 80 sessions with both scales (in 29 
sessions CRS-R indicated UWS and the CNC suggested MCS; in one ses- 
sion the CRS-R indicated MCS and the CNC suggested emergence from 
MCS). This could suggest that the CNC division to UWS (extreme coma 
and marked coma) and MCS (moderate coma and near coma) using the 
scale subcategories might be too liberal. Out of the 21 patients assessed 
only with CNC scale after the olfactory session, 4 were in UWS during all 
sessions, 12 were in MCS during all sessions, and 5 transitioned between 
UWS and MCS across sessions. Ifindeed the CNC is too liberal, itis possible 
that some patients ina MCS were in fact UWS. The consequences of such 
misclassification on the group-level analysis means that we could have 
underestimated the observed findings. Thus, we conclude that although 
such misclassification would be unfortunate, it would not weaken, but 
only strengthen our effects. A second possible consequence of the use of 
the CNC scale isthe detection of recovery while no recovery occurred. Out 
of the 16 patients who transited from UWS to MCS, 5 were assessed using 
the CNC but not with the CRS-R. Notably, the LCS“*“—which was assessed 
independently by the hospital team—provided additional evidence for 
conscious awareness in all of five of the patients, thus supporting the 
CNC behavioural assessment. Therefore, the lower sensitivity of CNC 
versus CRS:R in this study may have underestimated the power of the 
results, but does not appear to inaccurately detect the transition to MCS. 


Airflow analysis 
The ongoing respiration trace was filtered as follows. We first applied 
an equiripple low-pass filter of 10 Hz (pass frequency, 10 Hz; stop 


frequency, 20 Hz; allowed ripple amplitude, 1 dB; stop amplitude atten- 
uation, 60 dB; number of coefficients, 224). Next, for the identification 
of inhales and exhales, the hysteresis was applied after filtering the 
signal using an equiripple low-pass filter of 5 Hz (pass frequency, 5 Hz; 
stop frequency, 6 Hz, allowed ripple amplitude, 1 dB; stop amplitude 
attenuation, 60 dB; number of coefficients, 1,975). Discrete inhales 
and exhales were identified on the basis of hysteresis of either up to 
5 mV or 5% of the difference between the minimum and maximum 
values, whichever was the smallest of the two, and a minimum dura- 
tion of 250 ms. If the respiration variability was high, the hysteresis 
value was not constant for the whole session but was calculated using 
asliding window of 30s, based on the respiration variance. To account 
for changes in the respiration pattern across a session and between 
sessions, each nasal inhalation following a stimulus was normalized 
by dividing the nasal inhalation volume after the stimulus by the base- 
line inhalation volume (an average of three inhalations before odour 
administration; Fig. 1b). As patients with DOC often breathe through 
atracheostomy tube, we tested whether tracheostomy modulated the 
results. We found that although tracheostomy significantly reduces 
nasal inhalation (not normalized), it does not modulate normalized 
sniff responses (Extended Data Fig. 3). 


Definition of the sniff-response threshold 

At the single-patient level, the sensory-driven level-1 sniff-response 
and cognitively driven sniff-response thresholds were defined based 
on previous studies in healthy participants” as: (1) a reduction in 
sniff-response magnitude of 15% or more in relation to baseline res- 
piration (Supplementary Table 1) and/or (2) a sniff-response s.d. 
across all trials in the session of more than 0.35, based on the variabil- 
ity within the data-set (twice the averaged s.d. in MCS sessions). The 
level-2 sniff-response thresholds were defined as (1) a 20% difference 
in sniff-response magnitude between pleasant and unpleasant odor- 
ants and (2) areduction in the normalized nasal inhalation volume 
in relation to baseline (<1) for both odorants. More specifically, the 
sniff-response magnitude threshold was based on previous research 
investigating methods of sniff measurements in healthy participants”. 
To define the sniff-response threshold, we calculated the change in 
sniff volume averaged across pleasant (phenylethyl alcohol; CAS 
102-20-5, Sigma-Aldrich) and unpleasant (valeric acid; CAS 109-52-4, 
Sigma-Aldrich) odorants in relation to clean air in healthy participants” 
(Supplementary Table 1). Sniff volume (integral) values used for the 
calculation were measured using the same method as in our study (nasal 
cannula) and were the following: sniff volume for pleasant odorants, 
0.837; sniff volume for unpleasant odorants, 0.641; nasal inhalation 
volume for clear air, 0.86. This leads to an approximately 15% decrease 
if using the formula [(0.837 + 0.641)/2]/0.86 = 0.8593. 


Inclusion and exclusion criteria 

Sessions. A session was excluded from the group analysis on the basis of 
the following criteria. (1) Nasal respiration was not evident or unstable, 
and therefore no trials or not enough trials were available (41 sessions). 
Only sessions with more than 15 trials were included in the analysis (7 
patients had no nasal respiration or unstable nasal respiration and were 
therefore not included in the analysis). (2) The averaged normalized 
nasal inhalation volume was larger than 3.5 s.d. of the group mean. 
One session from each of three different patients was excluded using 
this criterion. Notably, all 3 patients had a later score of at least 48 (48, 
62 and 84) in the functional independence measure” and at least 49 
(49, 55 and 57) inthe LCS*S, indicating the emergence from MCS. This 
suggests that nasal inhalation could be informative of consciousness 
even in cases of potentially altered olfaction. 

These exclusion criteria retained 43 out of 50 patients with DOC 
(that is, 14% patients were excluded) (Extended Data Table 1), 146 out 
of 190 sessions (that is, 23.1% were sessions excluded) and 5,934 out of 
6,106 trials (that is, 2.82% trials were excluded) in this study. 


Trials. A trial was excluded froma session on the basis of the following 
criteria. (1) Baseline inhalation was unstable, presenting a monotonic 
decrease or increase of at least 40% change in the peak between the 
first and third baseline inhalation, and at least a 25% change between 
the first and second, and between the second and third inhalations. (2) 
Nosniffwas detected within 6.5 s of odour presentation. (3) No baseline 
inhalation was detected—respiration was too flat or if two trials were 
too close intime and therefore there were only three or less inhalations 
between trials. (4) There was an extreme change in the inhalation vol- 
ume between baseline and the subsequent inhalations. To identify these 
rare cases (12 trials, 0.19% of all trials), we measured the average and 
s.d. of inhale volumes at baseline and after the stimulus, and the coef- 
ficients of variation (the ratio of the s.d. to mean). A trial was excluded 
if the following three conditions all occurred. (a) The maximum of the 
two coefficients of variation was below 20%; (b) the percentage signal 
change of the two coefficients of variation was below 50%; and (c) the 
percentage signal change of the two means was above 50%. 

These exclusion criteria retained 5,778 out of the remaining 5,934 
trials after session exclusion (that is, 2.63% excluded) in this study. 


Baseline inhalations. A baseline inhalation was excluded from the 
averaged baseline inhalation on the basis of the following criteria. (1) 
The baseline inhale started more than 30s before odour presentation. 
(2) The baseline inhale overlapped with the sniff response of a previous 
trial. (3) The inhalation volume was 25% smaller or larger than the other 
two baseline inhalations in the trial. In this case, the baseline inhala- 
tion with the maximal difference in inhale volumes from the median 
volume was excluded. 

These exclusion criteria retained 16,300 out of 17,334 baseline inhala- 
tions (that is, 5.96% excluded) in this study. 


Sniffs. A sniff was excluded on the basis of the following criteria. (1) 
The normalized sniff volume was +3.5 s.d. of the averaged sniff in the 
session. (2) The first sniff was excluded if no sniff was detected within 
6.5s of odour presentation (and then the trial was excluded as well), and 
all other sniffs were excluded if not detected within 6.5s of the previous 
sniff (and then the later sniffs were also excluded). In 8 patients, the 
respiration rate was slower than typical and the threshold was extended 
(7.5sin3 patients, 8.5s in3 patients and 11s in1 patient). 

These exclusion criteria retained 16,999 out of 17,334 sniffs (that is, 
1.93% excluded) in this study. 


Sensitivity and specificity 

Specificity (true-negative rate) scores were calculated for patients 
with UWS who remained UWS. Sensitivity (true-positive rate) scores 
were calculated for patients with UWS who transitioned to MCS, for 
the long-term survival rates of a patient and for patients ina MCS*. 


Statistical analysis 

Normalized nasal inhalation volume values were not normally distrib- 
uted (Shapiro-Wilk tests; first sniff, W> 0.88 for all tests, P< 0.001 
for all tests; second sniff, W> 0.66 for all tests, P< 0.01 for all tests; 
third sniff, W> 0.63 for all tests, P< 0.04 for all tests) and displayed 
greater variance in MCS thanin UWS sessions (Leven’s tests; first sniff, 
F,447=3.9, P= 0.05; second sniff, F; ,47=3.3, P= 0.07; third sniff, F, ,,=16, 
P=0.21). Thus, nonparametric tests were used. For statistical analysis 
between MCS and UWS sessions, Wilcoxon rank-sum tests were used 
and for statistical analysis within each group, Wilcoxon signed-rank 
tests were used. Bonferroni corrections for multiple comparisons were 
applied for comparisons of nasal inhalation volume between MCS 
and UWS sessions and also within a group (two odours x three sniffs: 
0.05/6 = 0.0083). The effect size of nonparametric, dependent samples 
was calculated using the formula r=Z/v(n) as previously described”, 
where Zis the Wilcoxon signed-rank statistic and nis the sample size. 
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The effect size for nonparametric, independent samples was estimated 
using Cliff’s 6". The effect size for x’ tests was estimated using Cramer’s 
V?. The relation between the sniff responses and functional independ- 
ence measure” was assessed using Spearman correlations and was 
Bonferroni corrected for multiple comparisons (two states x three 
sniffs: 0.05/6 = 0.0083). Three sessions from two patients who had 
outlier values in the sniff response in later sessions, suggesting impaired 
olfaction, were excluded from the correlation analysis. When including 
these three sessions similar results were obtained (pleasant r,,=—0.27, 
P=0.075; unpleasant, r,,=—0.45, P=0.002; blank, r,,=—0.04, P=0.80). 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


Respiration data that support the findings of this study are avail- 
able from GitLab (https://gitlab.com/liorg/OlfactorySniffingAnaly- 
sis/). Source data for Figs. 1-4 are provided with the paper. 


Code availability 


Custom code created and used in this study is available from GitLab 
(https://gitlab.com/liorg/OlfactorySniffingAnalysis/). 
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Extended Data Fig. 1|See next page for caption. 
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Extended Data Fig. 1| The sniff response reflects the current level of 
consciousness in patients with DOC. Data are displayed by odorant and sniff. 
a-i, Normalized sniff volume after pleasant odorants (a, d, g), unpleasant 
odorants (b, e, h) and blank (c, f, i) during UWS (U) sessions (outline; n = 73) and 
MCS (M) sessions (filled; n = 73) for the first (a-c), second (d-f) and third (g-i) 
sniff after stimulus delivery. Left, each dot represents a session; flat violin plots 
show the distribution; the red lines denote the median; and the dashed 
horizontal lines denote the baseline value at 1 NFU. Right, data are the 

mean +s.e.m. for each distribution. The Pvalues beneath the distribution 


denotes its difference from baseline inhalation. The Pvalues above the 
distributions denote the difference in sniff response across groups. Pvalues 
were calculated using two-tailed Wilcoxon signed-rank tests for within-group 
comparisons and Wilcoxon rank-sum tests for between-group comparisons 
corrected for multiple comparisons. Corrected Pvalues are indicated by an 
asterisk (*) and uncorrected Pvalues are indicated by a hash (#) symbol 

(see Methods). *P< 0.05; *P< 0.05. Further analyses are providedin 

the Supplementary Information. 
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patients with DOC. To estimate whether the effects that we observed were each dot represents a session; flat violin plots show the distribution; the red 
dependent onthe contribution of the trigeminal nerve, we exposed a subset of lines denote the median; and the dashed horizontal lines denote the baseline 
patients to pure olfactory odorants and observed the replicationoftheeffects. | valueat1NFU. Right, dataarethe mean +s.e.m. for each distribution. The P 
The normalized sniff volume after exposure to the pure olfactants of the value beneath the distribution denotes its difference from baseline inhalation— 
pleasant odorant (phenylethyl alcohol; a,c, e) and unpleasant odorant that is, the existence of a sniff response. *P< 0.05; two-tailed Wilcoxon test. 
(decanoic acid; b, d, f) during UWS sessions (outline; pleasant, n= 56; Further analyses are provided in the Supplementary Information. 


unpleasant, n=57) and MCS sessions (filled; pleasant, n= 56; unpleasant n=56) 
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Extended Data Fig. 3| The sniffresponse is similar with and without 
tracheostomy. Normalized sniff volume after pleasant (a,c, e) or unpleasant 
(b, d, f) odorants during MCS sessions with (W; n=44) and without (O;n=29) 
tracheostomy for the first (a, b), second (c, d) and third (e, f) sniffafter odorant 
delivery. Left, each dot represents a session; flat violin plots show the 
distribution; the red lines denote the median; and the dashed horizontal lines 
denote the baseline value at 1 NFU. Right, data are the mean+s.e.m. foreach 
distribution. The Pvalue beneath the distribution denotes its difference from 
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baseline inhalation—that is, the existence of a sniff response. *P< 0.05; 
two-tailed Wilcoxon test. About 60% of MCS sessions and 80% of UWS sessions 
were conducted in patients with a tracheostomy. Although tracheostomy 
significantly reduces nasal airflow, ameasurable portion of nasal airflow 
remains. For example, we note that the raw datain Fig. 1c, d were obtained with 
a tracheostomy. Further analyses are provided inthe Supplementary 
Information. 
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Extended Data Fig. 4| The sniff response during MCS sessions. Theredlines = withasniffresponse.a-—c, Each dot isa MCS session. a, Pleasant odorant. 
denote the sniff-response threshold (more than 15% changeinmagnitudeand/or — b, Unpleasant odorant.c, Blank. d, Percentage of patients ina MCS 

0.35s.d.): dots within the lines (white background) reflect sessions without a (not sessions) with sniff responses (white, 64.5%) and without sniff responses 
sniff response; dots beyond the lines (shaded background) reflect sessions (red, 35.5%) across all three conditions. 
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Extended Data Fig. 5 | See next page for caption. 
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Extended Data Fig. 5| The sniff response is prognostic for the recovery of 
consciousness and long-term survival in patients with DOC. Data are shown 
by patient rather than by session. The red lines denote the threshold of a sniff 
response (more than 15% change in magnitude and/or 0.35s.d.).a—c, Each dot 
is asession with the strongest sniff response of a patient ina MCS (n=19). 

a, Pleasant odorant. b, Unpleasant odorant. c, Blank. d, Percentage of patients 
ina MCS with sniff responses (white, 64.5%) and without sniff responses (red, 
35.5%) across all three conditions. e-g, Each dot is a session with the strongest 
sniff response of a patient with UWS; empty dots represent patients who later 
recovered (transitioned to MCS; n=16) and filled dots represent patients who 
did not recover and remain unconscious (n= 8). e, Pleasant odorant. 

f, Unpleasant odorant. g, Blank. h, Percentage of patients with UWS who later 
transitioned to MCS (left, recovered) and who remain unconscious (right, 


unrecovered) with sniff responses (white; recovered, 62.5%; unrecovered, 0%) 
and without sniff responses (red; recovered, 37.5%; unrecovered, 100%) across 
all three conditions. i-k, Each dot is a patient with DOC (MCS and UWS; n = 43); 
filled black dots represent patients who died during the study and coloured 
dots represent patients who survived during the study (mean ¢+s.d., 

37.3 + 14.1months after brain injury). i, Pleasant odorant.j, Unpleasant odorant. 
k, Blank. I, Percentage of patients with DOC with sniff responses (left) who 
survived (white, 91.7%) and who are deceased (D) (red, 8.3%) and of patients 
with DOC without sniff responses who survived (white, 36.8%) and are 
deceased (red, 63.2%). m-o, Relation between the functional independence 
measure and normalized sniff volume. Each dot is a patient with UWS who 
survived during the study (n=12).m, Pleasant odorant. n, Unpleasant odorant. 
o, Blank. rrepresents Spearman correlation. 
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Extended Data Fig. 6 | Dependence of the predictive value on sniff-response 
thresholds in patients with UWS. a, The receiver-operating characteristic 
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the Supplementary Information. 
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Extended Data Fig. 7 | Dependence of the sensitivity and specificity on 
sniff-response thresholds. a, Sensitivity (true-positive rate (TPR)) for arange 
of sniff-response volume and sniff-response volume variability thresholds. 

b, Specificity (true-negative rate (TNR)) for arange of sniff-response volume 


and sniff-response volume variability thresholds. c, Distance froma 
randomized predictor for a range of sniff-response volume and sniff-response 
volume variability thresholds. Further analyses are provided in 

the Supplementary Information. 
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Extended Data Fig. 8 | Dependence of the predictive value onthe relation 
between conditions. a, ROC for a range of differences in the sniff-response 
volume between the unpleasant odorant and the blank. b, The true-positive 
and true-negative rates for arange of differences in the sniff-response volume 
between the unpleasant odorant and the blank. c, ROC for arange of 
differences in the sniff-response volume between the pleasant odorant and the 
blank. d, The true-positive and true-negative rates for a range of differencesin 
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the sniff-response volume between the pleasant odorant and the blank. e, ROC 
for arange of differences in the sniff-response volume between the pleasant 
and unpleasant odorants. f, The true-positive and true-negative rates fora 
range of differences in the sniff-response volume between the pleasant and 
unpleasant odorants. n= 24. Further analyses are provided in 

the Supplementary Information. 


Extended Data Table 1 | Patient information 


Patient Consciousness Age Gender Etiology Time Since Mean Inter- Number of Number of 
number state Onset session included sessions (all) 
at enrolment (months) _ Interval(weeks) — sessions 
L MCS 21 M TBI 1 2.7 3 3 
3 Mcs 20 F ABI 3.5 NA 1 2 
4 MCs 59 M TBI 4 3:5 3 3 
5 UWS 26 F TBI 75 3.1 3 3 
6 UWS 27 M TBI 3 2.9 5 5 
7 UWS 23 M TBI 1.5 1.7 9 9 
8 UWS 38 M ABI 2 1.3 12 12 
9 UWS 19 M ABI 2.5 1.4 10 10 
11. UWS 30 M CVA 3 1.7 12 13 
12 MCS 44 M ABI 3.5 NA 1 ul 
13 MCS 42 M CVA 2.5 3.7 al 2 
14 UWS 57 M TBI 2.5 ui 9 11 
16 UWS 55 M TBI 3.5 NA 1 3 
17 UWS 42 M TBI 5.5 NA i 3 
18 MCS 34 M TBI 9.5 3.6 1 3 
19 MCS 21 M TBI 7 6.4 3 5 
20 UWS 18 M TBI 7 9.6 2 7 
24 UWS 60 M TBI 3.5 1.0 2 3 
22 UWS 45 M TBI 4 NA 1 4 
23 MCS 44 M CVA 25 2.7 5 5 
24 UWS 66 M TBI 6.5 2.4 3 4 
25 MCS 38 M TBI 2.5 3.2 3 3 
26 MCS 70 M TBI 1.5 NA 1 1 
27 MCS 22 M TBI 3 NA 1 3 
28 UWS 49 M CVA 5 2.6 2 4 
30 MCs 22 M TBI dl NA 1 1 
31 UWS 64 M TBI 2 2.6 4 2 
32 UWS 61 F CVA 3.5 2.6 3 4 
33 MCS 23 M TBI 8 NA 1 7 
34 Mcs 19 M TBI 2.5 2.5 6 8 
35 MCs 75 F Infection 2.5 NA 1 il 
36 UWS 68 F TBI 25 1:4, 3 3 
37 UWS 39 F CVA 2 2.1 5 5 
38 MCS 55 F CVA 4.5 0.9 2 4 
39 Mcs 59 M TBI 3 2.0 2 2 
40 MCs 51 F CVA 2.5 1.6 2 2 
43 UWS 27 M TBI 1 2.2 2 3 
44 MCS 48 M TBI 5.5 2.3 2 2 
45 UWS 33 M TBI 2 1.6 4 4 
46 MCS 58 M CVA 3.5 NA 1 3 
47 UWS 48 M ABI 6 NA 1 2 
48 MCs 69 M CVA 3 2.0 7 8 
49 MCS 45 M TBI 10 4.0 4 4 
Total 146 181 


ABI, anoxic brain injury; CVA, cerebrovascular accident; F, female; M, male; TBI, traumatic brain injury. n = 43. NA, not applicable, when only one included session was available for a patient, 
there was no inter-session interval. 
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Extended Data Table 2 | Damage to olfactory-related brain areas 


Patient EGolony Cribriform plate Frontal lobe Orbitofronal Thalamic 
number damage damage damage damage 

1 TBI No No No No 

3 ABI No No No No 

4 TBI No Yes Yes No 

3 TBI No No No No 

6 TBI Suspected Yes No No 

7 TBI No No No No 

8 ABI Suspected No No No 

) ABI No No No No 
11 CVA No Yes Yes No 
12 ABI No No No No 
13 CVA No Yes Yes Yes 
14 TBI No No No Yes 
16 TBI Yes Yes No Yes 
17 TBI No No No No 
18 TBI Suspected Yes Yes No 
19 TBI Suspected No No No 
20 TBI No Yes No No 
21 TBI No No No Yes 
22 TBI No Yes Yes No 
23 CVA No No No No 
24 TBI No No No No 
25 TBI No No No No 
26 TBI No Yes No No 
27 TBI No Yes Yes No 
28 CVA No No No No 
30 TBI No Yes Yes No 
31 TBI No No No No 
32 CVA No Yes No No 
33 TBI Suspected Yes Yes Yes 
34 TBI No Yes No No 
35 Infection No No No No 
36 TBI No Yes No Yes 
37 CVA No Yes No No 
38 CVA No Yes Yes Yes 
39 TBI No Yes No No 
40 CVA No No No Yes 
43 TBI No No No Yes 
44 TBI No Yes Yes No 
45 TBI No No No No 
46 CVA No No No Yes 
47 ABI No No No No 
48 CVA No No No No 
49 TBI No Yes No No 


Readings of structural imaging (computed tomography and magnetic resonance imaging), which were mostly conducted directly after injury (n = 43). If traumatic brain injuries were not equally 
distributed across the patient subgroups, this may have biased the outcome. To address this, we compared the proportion of traumatic brain injuries in each patient group (patients ina MCS, 
patients who transitioned between MCS and UWS, and patients with UWS) and observed no differences (11 out of 19 patients in a MCS, 10 out of 16 transitioning patients, and 6 out of 8 UWS; 

x’ = 0.71, P=0.7). Moreover, we repeated the analysis excluding the 27 patients with traumatic brain injuries. The remaining aetiologies were anoxic brain injuries (n = 5), cerebrovascular acci- 
dents (n = 10) and infection (n = 1). Repeating the analysis with only 37% of the original sample size, similar results were obtained. At the first sniff after odorant presentation, MCS sniff volume 
was significantly smaller than UWS sniff volume for both pleasant (MCS, mean ¢ s.d. = 0.88 + 0.22 NFU, median = 0.91 NFU; UWS, mean + s.d. = 0.97 + 0.15 NFU, median = 1.0 NFU; Z = 2.2, P= 0.027) 
and unpleasant odorants (MCS, mean + s.d. = 0.84 + 0.20 NFU, median = 0.89 NFU; UWS, mean + s.d. = 0.99 + 0.13 NFU, median = 0.99 NFU; Z= 3.26, P= 0.001). As for the cognitively driven sniff 
response, we observed only a weak trend in this limited sample (MCS, mean + s.d. = 0.965 + 0.12 NFU, median = 0.99 NFU; UWS, mean + s.d. = 1.0 + 0.18 NFU, median = 1.0 NFU; Z=1.1, P= 0.274). 
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Life sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Sample size 50 patients with 190 sessions were collected. We generate the sample size consistent with published studies in this field (Nigri et al., 2015, 
EJN) 


Data exclusions Data exclusion was based on nasal respiration. Out of 50 patients and 190 sessions, 43 patients and 146 sessions were included. Total of 41 
olfactory testing sessions were excluded due to lack of stable nasal respiration and 3 sessions were excluded due to sniff volumed extending 3 
SD. Seven patients were excluded due lack of a single session with stable nasal respiration. The rational behind session exclusion is 
insufficient data collection. The rational behind trials exclusion was unreliability due to noise. The number of trials required for inclusion was 
based on previous studies (Arzi et al., 2012 Nat. Neuro; Arzi et al., 2014, J. Neuro.). Detailed inclusion/exclusion criteria can be found in the 
methods section 


Replication To verify the reproducibility of the findings we conducted several control analyses, verifying that the findings are driven by olfactory sniff- 
responses and not by odor trigeminality , nor by changes in nasal respiration due to tracheostomy. Replication of sniff-response in MCS but 
not in VS/UWS using pure olfactory odorants was successful. In addition, we conducted effect size analyses to provide information on the 
strength of evidence supporting the findings. 


Randomization Patients were allocated to VS/UWS or MCS group based on behavioural assessments of their consciousness state on the day of the olfactory 
sniff-response test. Therefore, randomization between VS/UWS and MCS is not relevant. 


Blinding On the first olfactory testing session of each patients the experimenters were unaware of the patients' state of consciousness and conducted 
an independent behavioral evaluation. As patients state of consciousness fluctuate data acquisition of the behavioral state was done in each 
session by the same experimenter and therefore blinding beyond the first session was not applicable. The analysis of sniff-responses is 
automated and blind to the patients' state of consciousness, recovery and survival. 
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Antibodies ChIP-seq 
Eukaryotic cell lines Flow cytometry 
Palaeontology MRI-based neuroimaging 
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Policy information about studies involving human research participants 


Population characteristics Disorders of consciousness (DoC) patients 


Recruitment Patients were recruited by the Loewenstein rehabilitation hospital physicians. The legal guardian of any patient arriving to the 
intensive care and rehabilitation of consciousness unit in the hospital were asked if they are willing to participate in the study. 
The physicians recruiting the patients were not involved in data collection in order to minimize any potential bias in recruitment. 


Ethics oversight The study was approved by the ethics committee of Loewenstein rehabilitation hospital. Written informed consent was obtained 
from the patients’ legal guardian. 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 
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Genetic variants that inactivate protein-coding genes are a powerful source of 
information about the phenotypic consequences of gene disruption: genes that are 
crucial for the function of an organism will be depleted of such variants in natural 
populations, whereas non-essential genes will tolerate their accumulation. However, 
predicted loss-of-function variants are enriched for annotation errors, and tend to be 
found at extremely low frequencies, so their analysis requires careful variant 
annotation and very large sample sizes’. Here we describe the aggregation of 125,748 
exomes and 15,708 genomes from human sequencing studies into the Genome 
Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted 
loss-of-function variants in this cohort after filtering for artefacts caused by 
sequencing and annotation errors. Using an improved model of human mutation 
rates, we classify human protein-coding genes along a spectrum that represents 
tolerance to inactivation, validate this classification using data from model organisms 
and engineered human cells, and show that it can be used to improve the power of 
gene discovery for both common and rare diseases. 


The physiological function of most genes in the human genome remains 
unknown. In biology, as in many engineering and scientific fields, break- 
ing the individual components of acomplex system can provide valu- 
able insight into the structure and behaviour of that system. For the 
discovery of gene function, acommon approach is to introduce dis- 
ruptive mutations into genes and determine their effects on cellular 
and physiological phenotypes in mutant organisms or cell lines”. Such 
studies have yielded valuable insight into eukaryotic physiology and 


have guided the design of therapeutic agents’. However, although 
studies in model organisms and human cell lines have been crucial in 
deciphering the function of many human genes, they remain imperfect 
proxies for human physiology. 

Obvious ethical and technical constraints prevent the large-scale 
engineering of loss-of-function mutations in humans. However, recent 
exome and genome sequencing projects have revealed a surprisingly 
high burden of natural pLoF variation in the human population, 
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including stop-gained, essential splice, and frameshift variants’, which 
can serve as natural models for inactivation of human genes. Such 
variants have already revealed much about human biology and disease 
mechanisms, through many decades of study of the genetic basis of 
severe Mendelian diseases’, most of which are driven by disruptive vari- 
ants in either the heterozygous or homozygous state. These variants 
have also proved valuable in identifying potential therapeutic targets: 
confirmed LoF variants in the PCSK9 gene have been causally linked to 
low levels of low-density lipoprotein cholesterol®, and have ultimately 
led to the development of several inhibitors of PCSK9 that are nowin 
clinical use for the reduction of cardiovascular disease risk. A systematic 
catalogue of pLoF variants in humans and the classification of genes 
along a spectrum of tolerance to inactivation would provide a valuable 
resource for medical genetics, identifying candidate disease-causing 
mutations, potential therapeutic targets, and windows into the normal 
function of many currently uncharacterized human genes. 

Several challenges arise when assessing LoF variants at scale. LoF 
variants are on average deleterious, and are thus typically main- 
tained at very low frequencies in the human population. Systematic 
genome-wide discovery of these variants requires whole-exome or 
whole-genome sequencing of very large numbers of samples. In addi- 
tion, LoF variants are enriched for false positives compared with syn- 
onymous or other benign variants, including mapping, genotyping 
(including somatic variation), and particularly, annotation errors’, and 
careful filtering is required to remove such artefacts. 

Population surveys of coding variation enable the evaluation of the 
strength of natural selection at a gene or region level. As natural selec- 
tion purges deleterious variants from human populations, methods to 
detect selection have modelled the reduction in variation (constraint)’ 
or shift in the allele frequency distribution’, compared to an expecta- 
tion. For analyses of selection on coding variation, synonymous vari- 
ation provides a convenient baseline, controlling for other potential 
population genetic forces that may influence the amount of variation 
as well as technical features of the local sequence. A model of constraint 
was previously applied to define a set of 3,230 genes witha high prob- 
ability of intolerance to heterozygous pLoF variation (pLI)* and esti- 
mated the selection coefficient for variants in these genes’. However, 
the ability to comprehensively characterize the degree of selection 
against pLoF variants is particularly limited, as for small genes, the 
expected number of mutations is still very low, even for samples of up 
to 60,000 individuals*°. Furthermore, the previous dichotomization 
of pLI, although convenient for the characterization of a set of genes, 
disguises variability in the degree of selective pressure against a given 
class of variation and overlooks more subtle levels of intolerance to 
pLoF variation. With larger sample sizes, amore accurate quantitative 
measure of selective pressure is possible. 

Here, we describe the detection of pLoF variants ina cohort of 125,748 
individuals with whole-exome sequence data and 15,708 individuals 
with whole-genome sequence data, as part of the Genome Aggregation 
Database (gnomAD; https://gnomad.broadinstitute.org), the successor 
to the Exome Aggregation Consortium (ExAC). We develop acontinu- 
ous measure of intolerance to pLoF variation, which places each gene 
onaspectrum of LoF intolerance. We validate this metric by comparing 
its distribution to several orthogonal indicators of constraint, includ- 
ing the incidence of structural variation and the essentiality of genes 
as measured using mouse gene knockout experiments and cellular 
inactivation assays. Finally, we demonstrate that this metric improves 
the interpretation of genetic variants that influence rare disease and 
provides insight into common disease biology. These analyses provide, 
to our knowledge, the most comprehensive catalogue so far of the 
sensitivity of human genes to disruption. 

Ina series of accompanying manuscripts, other complementary 
analyses of this dataset are described. Using an overlapping set of 14,237 
whole genomes, the discovery and characterization of a wide variety of 
structural variants (large deletions, duplications, insertions, or other 


rearrangements of DNA) is reported”. The value of pLoF variants for 
the discovery and validation of therapeutic drug targets is explored”, 
and a case study of the use of these variants from gnomAD and other 
large reference datasets is provided to validate the safety of inhibition 
of LRRK2—a candidate therapeutic target for Parkinson’s disease”. By 
combining the gnomAD dataset witha large collection of RNA sequenc- 
ing data from adult human tissues”, the value of tissue expression 
data in the interpretation of genetic variation across a range of human 
diseases is reported”. Finally, the effect of two understudied classes of 
human variation—multi-nucleotide variants” and variants that create 
or disrupt open-reading frames in the 5’ untranslated region of human 
genes—is characterized and investigated”. 


A high-quality catalogue of variation 


We aggregated whole-exome sequencing data from 199,558 individuals 
and whole-genome sequencing data from 20,314 individuals. These 
data were obtained primarily from case-control studies of common 
adult-onset diseases, including cardiovascular disease, type 2 diabe- 
tes and psychiatric disorders. Each dataset, totalling more than 1.3 
and 1.6 petabytes of raw sequencing data, respectively, was uniformly 
processed, joint variant calling was performed on each dataset using a 
standardized BWA-Picard-GATK pipeline’, and all data processing and 
analysis was performed using Hail’’. We performed stringent sample 
quality control (Extended Data Fig. 1), removing samples with lower 
sequencing quality by a variety of metrics, samples from second-degree 
or closer related individuals across both datatypes, samples with inad- 
equate consent for the release of aggregate data, and samples from indi- 
viduals known to have a severe childhood-onset disease as well as their 
first-degree relatives. The final gnomAD release contains genetic vari- 
ation from 125,748 exomes and 15,708 genomes from unique unrelated 
individuals with high-quality sequence data, spanning 6 global and 8 
sub-continental ancestries (Fig. 1a, b), which we have made publicly 
available at https://gnomad.broadinstitute.org. We also provide subsets 
of the gnomAD datasets, which exclude individuals who are cases in 
case-control studies, or who are cases of a few particular disease types 
suchas cancer and neurological disorders, or who are also aggregated 
inthe Bravo TOPMed variant browser (https://bravo.sph.umich.edu). 

Among these individuals, we discovered 17.2 million and 261.9 mil- 
lion variants in the exome and genome datasets, respectively; these 
variants were filtered using a custom random forest process (Supple- 
mentary Information) to 14.9 million and 229.9 million high-quality 
variants. Comparing our variant calls intwo samples for which we had 
independent gold-standard variant calls, we found that our filtering 
achieves very high precision (more than 99% for single nucleotide 
variants (SNVs), over 98.5% for indels in both exomes and genomes) 
and recall (over 90% for SNVs and more than 82% for indels for both 
exomes and genomes) at the single sample level (Extended Data Fig. 2). 
In addition, we leveraged data from 4,568 and 212 trios included in 
our exome and genome call-sets, respectively, to assess the quality of 
our rare variants. We found that our model retains over 97.8% of the 
transmitted singletons (singletons in the unrelated individuals that 
are transmitted to an offspring) on chromosome 20 (which was not 
used for model training) (Extended Data Fig. 3a-d). In addition, the 
number of putative de novo calls after filtering are in line with expecta- 
tions”? (Extended Data Fig. 3e-h), and our model had a recall of 97.3% for 
de novo SNVs and 98% for de novo indels based on 375 independently 
validated de novo variants in our whole-exome trios (295 SNVs and 80 
indels) (Extended Data Fig. 3i, j). Altogether, these results indicate that 
our filtering strategy produced a call-set with high precision and recall 
for both common and rare variants. 

These variants reflect the expected patterns based on mutation and 
selection: we observe 84.9% of all possible consistently methylated 
CpG-to-TpG transitions that would create synonymous variants in the 
human exome (Supplementary Table 14), which indicates that at this 
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Fig. 1| Aggregation of 141,456 exome and genome sequences. a, Uniform 
manifold approximation and projection (UMAP)*°* plot depicting the 
ancestral diversity of all individuals in gnomAD, using ten principal 
components. Note that long-range distances inthe UMAP space are not aproxy 
for genetic distance. b, The number of individuals by population and 
subpopulation in the gnomAD database. Colours representing populationsina 
and bare consistent.c, d, The mutability-adjusted proportion of singletons* 
(MAPS) is shown across functional categories for SNVs in exomes (c; x axis 
shared with e and g) and genomes (d; x axis shared with fand h). Higher values 


sample size, we are beginning to approach mutational saturation of 
this highly mutable and weakly negatively selected variant class. How- 
ever, we only observe 52% of methylated CpG stop-gained variants, 
which illustrates the action of natural selection removing a substantial 
fraction of gene-disrupting variants from the population (Fig. 1c-h). 
Across all mutational contexts, only 11.5% and 3.7% of the possible syn- 
onymous and stop-gained variants, respectively, are observed in the 
exome dataset, which indicates that current sample sizes remain far 
from capturing complete mutational saturation of the human exome 
(Extended Data Fig. 4). 


Identifying loss-of-function variants 

Some LoF variants will result in embryonic lethality in humans ina het- 
erozygous state, whereas others are benign even at homozygosity, with 
a wide spectrum of effects in between. Throughout this manuscript, 
we define pLoF variants to be those that introduce a premature stop 
(stop-gained), shift-reported transcriptional frame (frameshift), or 
alter the two essential splice-site nucleotides immediately to the left 
and right of each exon (splice) found in protein-coding transcripts, and 
ascertain their presence in the cohort of 125,748 individuals with exome 
sequence data. As these variants are enriched for annotation artefacts}, 
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indicate an enrichment of lower frequency variants, which suggests increased 
deleteriousness. e, f, The proportion of possible variants observed for each 
functional class for each mutational type for exomes (e) and genomes (f). CpG 
transitions are more saturated, except where selection (for example, pLoFs) or 
hypomethylation (5’ untranslated region) decreases the number of 
observations. g, h, The total number of variants observed in each functional 
class for exomes (g) and genomes (h). Error bars in c-frepresent 95% 
confidence intervals (note that insome cases these are fully contained within 
the plotted point). 


we developed the loss-of-function transcript effect estimator (LOFTEE) 
package, which applies stringent filtering criteria from first principles 
(such as removing terminal truncation variants, as well as rescued splice 
variants, that are predicted to escape nonsense-mediated decay) to 
pLoF variants annotated by the variant effect predictor (Extended Data 
Fig. 5a). Despite not using frequency information, we find that this 
method disproportionately removes pLoF variants that are commonin 
the population, which are known to be enriched for annotation errors!, 
while retaining rare, probable deleterious variations, as well as reported 
pathogenic variation (Fig. 2a). LOFTEE distinguishes high-confidence 
pLoF variants from annotation artefacts, and identifies a set of putative 
splice variants outside the essential splice site. The filtering strategy of 
LOFTEE is conservative in the interest of increasing specificity, filtering 
some potentially functional variants that display a frequency spectrum 
consistent with that of missense variation (Fig. 2b). Applying LOFTEE 
v1.0, we discover 443,769 high-confidence pLoF variants, of which 
413,097 fall on the canonical transcripts of 16,694 genes. The number 
of pLoF variants per individual is consistent with previous reports!, and 
is highly dependent on the frequency filters chosen (Supplementary 
Table 17). 

Aggregating across variants, we created a gene-level pLoF frequency 
metric to estimate the proportion of haplotypes that contain an inactive 
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Fig. 2| Generating a high-confidence set of pLoF variants. a, The percentage 
of variants filtered by LOFTEE grouped by ClinVar status and gnomAD 
frequency. Despite not using frequency information, LOFTEE removes a larger 
proportion of common variants, anda very low proportion of reported 
disease-causing variation. b, MAPS (see Fig. 1c, d) isshown by LOFTEE 
designation and filter. Variants filtered out by LOFTEE exhibit frequency 
spectra that are similar to those of missense variants; predicted splice variants 
outside the essential splice site are more rare, and high-confidence variants are 
very likely to be singletons. Only SNVs with at least 80% call rate are included 


copy of each gene. We find that 1,555 genes have an aggregate pLoF 
frequency of at least 0.1% across all individuals in the dataset (Extended 
Data Fig. 5c), and 3,270 genes have an aggregate pLoF frequency of at 
least 0.1% in any one population. Furthermore, we characterized the 
landscape of genic tolerance to homozygous inactivation, identifying 
4,332 pLoF variants that are homozygous in at least one individual. 
Given the rarity of true homozygous LoF variants, we expected sub- 
stantial enrichment of such variants for sequencing and annotation 
errors, and we subjected this set to additional filtering and deep manual 
curation before defining a set of 1,815 genes (2,636 high-confidence 
variants) that are likely to be tolerant to biallelic inactivation (Sup- 
plementary Data 7). 


The LoF intolerance of human genes 


Just as a preponderance of pLoF variants is useful for identifying 
LoF-tolerant genes, we can conversely characterize the intolerance ofa 
gene to inactivation by identifying marked depletions of predicted LoF 
variation*”. Here, we presenta refined mutational model, which incor- 
porates methylation, base-level coverage correction, and LOFTEE (Sup- 
plementary Information, Extended Data Fig. 6), to predict expected 
levels of variation under neutrality. Under this updated model, the 


here. Error bars represent 95% confidence intervals. c,d, The total number of 
pLoF variants (c), and proportion of genes with more than ten pLoF variants (d) 
observed and expected (in the absence of selection) as a function of sample 
size (downsampled from gnomAD). Selection reduces the number of variants 
observed, and variant discovery approximately follows a square-root 
relationship with the number of samples. At current sample sizes, we would 
expect to identify more than 10 pLoF variants for 72.1% of genes inthe absence 
of selection. 


variation in the number of synonymous variants observed is accurately 
captured (r=0.979). We then applied this method to detect depletion 
of pLoF variation by comparing the number of observed pLoF variants 
against our expectation in the gnomAD exome data from 125,748 indi- 
viduals—more than doubling the sample size of ExAC, the previously 
largest exome collection’. For this dataset, we computed a median of 
17.9 expected pLoF variants per gene (Fig. 2c) and found that 72.1% of 
genes have more than 10 pLoF variants (powered to be classified into 
the most constrained genes) (Supplementary Information) expected 
onthe canonicaltranscript (Fig. 2d), an increase from 13.2% and 62.8%, 
respectively, in ExAC. 

The smaller sample size in ExAC required a transformation of the 
observed and expected values for the number of pLoF variants in each 
gene into the pLI: this metric estimates the probability that a gene 
falls into the class of LoF-haploinsufficient genes (approximately 10% 
observed/expected variation) and is ideally used as a dichotomous 
metric (producing 3,230 genes with pLI > 0.9). Here, our refined model 
and substantially increased sample size enabled us to directly assess the 
degree of intolerance to pLoF variation in each gene using the continu- 
ous metric of the observed/expected ratio and to estimate a confidence 
interval around the ratio. We find that the median observed/expected 
ratio is 48%, which indicates that, as noted previously, most genes 
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Fig. 3 | The functional spectrum of pLoF impact.a, The percentage of genes 
in aset of curated gene lists represented in each LOEUF decile. 
Haploinsufficient genes are enriched among the most constrained genes, 
whereas recessive genes are spread in the middle of the distribution, and 
olfactory receptor genes are largely unconstrained. b, The occurrence of 6,735 
rare LoF deletion structural variants (SVs) is correlated with LOEUF (computed 


exhibit at least moderate selection against pLoF variation, and that 
the distribution of the observed/expected ratio is not dichotomous, 
but continuous (Extended Data Fig. 7a). For downstream analyses, 
unless otherwise specified, we use the 90% upper bound of this confi- 
dence interval, which we term the loss-of-function observed/expected 
upper bound fraction (LOEUF) (Extended Data Fig. 7b, c), and bin19,197 
genes into deciles of approximately 1,920 genes each. At current sample 
sizes, this metric enables the quantitative assessment of constraint 
with a built-in confidence value, and distinguishes small genes (for 
example, those with observed = 0, expected = 2; LOEUF = 1.34) from 
large genes (for example, observed =0, expected = 100; LOEUF = 0.03), 
while retaining the continuous properties of the direct estimate of the 
ratio (Supplementary Information). At one extreme of the distribu- 
tion, we observe genes with a very strong depletion of pLoF variation 
(first LOEUF decile aggregate observed/expected approximately 6%) 
(Extended Data Fig. 7e), including genes previously characterized as 
high pLI (Extended Data Fig. 7f). By contrast, we find unconstrained 
genes that are relatively tolerant of inactivation, including many that 
contain homozygous pLoF variants (Extended Data Fig. 7g). 

We note that the use of the upper bound means that LOEUF is a 
conservative metric in one direction: genes with low LOEUF scores 
are confidently depleted for pLoF variation, whereas genes with high 
LOEUF scores area mixture of genes without depletion, and genes that 
are too small to obtain a precise estimate of the observed/expected 
ratio. In general, however, the scale of gnomAD means that gene length 
is rarely a substantive confounder for the analyses described here, 
and all downstream analyses are adjusted for the length of the coding 
sequence or filtered to genes with at least ten expected pLoFs (Sup- 
plementary Information). 


Validation of the LoF-intolerance score 


The LOEUF metric allows us to place each gene along a continuous 
spectrum of tolerance to inactivation. We examined the correlation of 
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from SNVs; linear regression r=0.13; P=9.8 x 10~). Error bars represent 95% 
confidence intervals from bootstrapping. c, d, Constrained genes are more 
likely to be lethal when heterozygously inactivated in mouse and cause cellular 
lethality when disrupted in human cells (c), whereas unconstrained genes are 
more likely to be tolerant of disruption in cellular models (d). For all panels, 
more constrained genes are shown ontheleft. 


this metric with several independent measures of genic sensitivity to 
disruption. First, we found that LOEUF is consistent with the expected 
behaviour of well-established gene sets: known haploinsufficient genes 
are strongly depleted of pLoF variation, whereas olfactory receptors are 
relatively unconstrained, and genes with a known autosomal recessive 
mechanism, for which selection against heterozygous disruptive vari- 
ants tends to be present but weak’, fall in the middle of the distribution 
(Fig. 3a). In addition, LOEUF is positively correlated with the occur- 
rence of 6,735 rare autosomal deletion structural variants overlapping 
protein-coding exons identified in a subset of 6,749 individuals with 
whole-genome sequencing data in this manuscript" (r= 0.13; P=9.8 
10°) (Fig. 3b). 

This constraint metric also correlates with results in model sys- 
tems: in 389 genes with orthologues that are embryonically lethal 
after heterozygous deletion in mouse”, we find a lower LOEUF 
score (mean = 0.488), compared with the remaining 18,808 genes 
(mean = 0.962; t-test P=10™’) (Fig. 3c). Similarly, the 678 genes that are 
essential for human cell viability as characterized by CRISPR screens” 
are also depleted for pLoF variation (mean LOEUF = 0.63) in the gen- 
eral population compared to background (18,519 genes with mean 
LOEUF = 0.964; t-test P=9 x 10°”), whereas the 777 non-essential genes 
are more likely to be unconstrained (mean LOEUF = 1.34, compared to 
remaining 18,420 genes with mean LOEUF = 0.936; t-test P=3 x 10°”) 
(Fig. 3d). 


Biological properties of constraint 

We investigated the properties of genes and transcripts as a func- 
tion of their tolerance to pLoF variation (LOEUF). First, we found 
that LOEUF correlates with the degree of connection of a gene in 
protein-interaction networks (r=-0.14; P=1.7 x 10™ after adjusting 
for gene length) (Fig. 4a) and functional characterization (Extended 
Data Fig. 8a). In addition, constrained genes are more likely to be ubiq- 
uitously expressed across 38 tissues in the Genotype-Tissue Expression 
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Fig. 4| Biological properties of constrained genes and transcripts. a, The 
mean number of protein-protein interactions is plotted as a function of LOEUF 
decile: more constrained genes have more interaction partners (LOEUF linear 
regressionr=—0.14; P=1.7 x10™). Error bars correspond to 95% confidence 
intervals. b, The number of tissues where a gene is expressed (transcripts per 
million > 0.3), binned by LOEUF decile, is shown as a violin plot with the mean 
number overlaid as points: more constrained genes are more likely to be 
expressed in several tissues (LOEUF linear regression r=—0.31; P<1x107). 
c, For 1,740 genes in which there exists at least one constrained and one 
unconstrained transcript, the proportion of expression derived from the 
constrained transcript is plotted asa histogram. 


(GTEx) project (Fig. 4b) (LOEUF r=—0.31; P<1x 10°”) and have higher 
expression on average (LOEUF p =-0.28; P<1x107”), consistent with 
previous results*. Although most results in this study are reported at the 
gene level, we have also extended our framework to compute LOEUF 
for all protein-coding transcripts, allowing us to explore the extent of 
differential constraint of transcripts within a given gene. In cases in 
whicha gene contained transcripts with varying levels of constraint, we 


found that transcripts in the first LOEUF decile were more likely to be 
expressed across tissues than others in the same gene (n=1,740 genes), 
even when adjusted for transcript length (Fig. 4c) (constrained tran- 
scripts are on average 6.34 transcripts per million higher; P=2.2 x10“). 
Furthermore, we found that the most constrained transcript for each 
gene was typically the most highly expressed transcript in tissues with 
disease relevance” (Extended Data Fig. 8c), which supports the need 
for transcript-based variant interpretation, as explored in more depth 
in an accompanying manuscript». 

Finally, we investigated potential differences in LOEUF across human 
populations, restricting to the same sample size across all populations 
to remove bias due to differential power for variant discovery. As the 
smallest population in our exome dataset (African/African American) 
has only 8,128 individuals, our ability to detect constraint against pLoF 
variants for individual genes is limited. However, for well-powered 
genes (expected pLoF >10) (Supplementary Information), we observed 
alower mean observed/expected ratio and LOEUF across genes among 
African/African American individuals, a population with a larger effec- 
tive population size, compared with other populations (Extended Data 
Fig. 8d, e), consistent with the increased efficiency of selection in popu- 
lations with larger effective population sizes*>”®. 


Constraint informs disease aetiologies 


The LOEUF metric can be applied to improve molecular diagnosis and 
advance our understanding of disease mechanisms. Disease-associated 
genes, discovered by different technologies over the course of many 
years across all categories of inheritance and effects, span the entire 
spectrum of LoF tolerance (Extended Data Fig. 9a). However, inrecent 
years, high-throughput sequencing technologies have enabled the 
identification of highly deleterious variants that are de novo or only 
inherited in small families or trios, leading to the discovery of novel dis- 
ease genes under extreme constraint against pLoF variation that could 
not have been identified by linkage approaches that rely on broadly 
inherited variation (Extended Data Fig. 9b). This result is consistent 
with arecent analysis that shows a post-whole-exome/whole-genome 
sequencing era enrichment for gene-disease relationships attributable 
to de novo variants”. 

Rare variants, which are more likely to be deleterious, are expected 
to exhibit stronger effects on average in constrained genes (previously 
shown using pLI from ExAC”’), with an effect size related to the severity 
and reproductive fitness of the phenotype. In an independent cohort 
of 5,305 individuals with intellectual disability or developmental dis- 
orders and 2,179 controls, the rate of pLoF de novo variation in cases 
is 15-fold higher in genes belonging to the most constrained LOEUF 
decile, compared with controls (Fig. 5a), with a slightly increased rate 
(2.9-fold) in the second highest decile but not in others. A similar, but 
attenuated enrichment (4.4-fold inthe most constrained decile) is seen 
for de novo variants in 6,430 patients with autism spectrum disorder 
(Extended Data Fig. 9c). Furthermore, in burden tests of rare variants 
(allele count across both cases and controls = 1) of patients with schizo- 
phrenia”®, we find a significantly higher odds ratio in constrained genes 
(Extended Data Fig. 9d). 

Finally, although pLoF variants are predominantly rare, other 
more common variation in constrained genes may also be deleteri- 
ous, including the effects of other coding or regulatory variants. Ina 
heritability partitioning analysis of association results for 658 traits in 
the UK Biobank and other large-scale genome-wide association study 
(GWAS) efforts, we find an enrichment of common variant associations 
near genes that is linearly related to LOEUF decile across numerous 
traits (Fig. 5b). Schizophrenia and educational attainment are the most 
enriched traits (Fig. 5c), consistent with previous observations in asso- 
ciations between rare pLoF variants and these phenotypes” ™. This 
enrichment persists even when accounting for gene size, expression 
in GTEx brain samples, and previously tested annotations of functional 
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Fig. 5| Disease applications of constraint. a, The rate ratio is defined by the 
rate of de novo variants (number per patient) in 5,305 cases of intellectual 
disability/developmental delay (ID/DD) divided by the rate in 2,179 controls. 
pLoF variants in the most constrained decile of the genome are approximately 
11-fold more likely to be found in cases compared to controls. Error bars 
represent 95% confidence intervals. b, Marginal enrichment in per-SNV 
heritability explained by common (minor allele frequency > 5%) variants within 
100-kb of genes in each LOEUF decile, estimated by linkage disequilibrium (LD) 
score regression*®, Enrichment is compared to the average SNV genome-wide. 
The results reported here are from random effects meta-analysis of 276 
independent traits (subsetted from the 658 traits with UK Biobank or 
large-scale consortium GWAS results). Error bars represent 95% confidence 
intervals. c, Conditional enrichment in per-SNV common variant heritability 
tested using regression of linkage disequilibrium score in each of 658 common 
disease and trait GWAS results. Pvalues evaluate whether per-SNV heritability 
is proportional to the LOEUF of the nearest gene, conditional on 75 existing 
functional, linkage disequilibrium, and minor-allele-frequency-related 
genomic annotations. Colours alternate by broad phenotype category. 


regions and evolutionary conservation, and suggests that some herit- 
able polygenic diseases and traits, particularly cognitive or psychiatric 
ones, have an underlying genetic architecture that is driven substan- 
tially by constrained genes (Extended Data Fig. 10). 


Discussion 


Inthis paper and accompanying publications, we present the largest, 
to our knowledge, catalogue of harmonized variant data from any 
species so far, incorporating exome or genome sequence data from 
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more than 140,000 humans. The gnomAD dataset of over 270 million 
variants is publicly available (https://gnomad.broadinstitute.org), 
and has already been widely used as a resource for estimates of allele 
frequency inthe context of rare disease diagnosis (for arecent review, 
see Eilbeck et al.**), improving power for disease gene discovery ®, 
estimating genetic disease frequencies**”, and exploring the binfopical 
effect of genetic variation®*”. Here, we describe the application of this 
dataset to calculate a continuous metric that describes a spectrum of 
tolerance to pLoF variation for each protein-coding gene inthe human 
genome. We validate this method using known gene sets and data from 
model organisms, and explore the value of this metric for investigating 
human gene function and discovery of disease genes. 

We have focused on high-confidence, high-impact pLoF variants, 
calibrating our analysis to be highly specific to compensate for the 
increased false-positive rate among deleterious variants. However, 
some additional error modes may still exist, and indeed, several recent 
experiments have proposed uncharacterized mechanisms for escape 
fromnonsense-mediated mRNA decay**!, Furthermore, sucha strin- 
gent approach will remove some true positives. For example, terminal 
truncations that are removed by LOFTEE may still exert a LoF mecha- 
nism through the removal of crucial C-terminal domains, despite the 
escape of the gene from nonsense-mediated decay. In addition, current 
annotation tools are incapable of detecting all classes of LoF varia- 
tion and typically miss, for instance, missense variants that inactivate 
specific gene functions, as well as high-impact variants in regulatory 
regions. Future work will benefit from the increasing availability of 
high-throughput experimental assays that can assess the functional 
effect of all possible coding variants ina target gene”, although scaling 
these experimental assays to all protein-coding genes represents a huge 
challenge. Identifying constraint in individual regulatory elements 
outside coding regions will be even more challenging, and require much 
larger sample sizes of whole genomes as well as improved functional 
annotation’. We discuss one class of high-impact regulatory variants in 
acompanion manuscript”, but many remain to be fully characterized. 

Although the gnomAD dataset is of unprecedented scale, it has 
important limitations. At this sample size, we remain far from saturating 
all possible pLoF variants in the human exome; even at the most mutable 
sites inthe genome (methylated CpG dinucleotides), we observe only 
half of all possible stop-gained variants. A substantial fraction of the 
remaining variants are likely to be heterozygous lethal, whereas others 
will exhibit an intermediate selection coefficient; much larger sample 
sizes (in the millions to hundreds of millions of individuals) will be 
required for comprehensive characterization of selection against all 
individual LoF variants inthe human genome. Such future studies would 
also benefit substantially from increased ancestral diversity beyond 
the European-centric sampling of many current studies, which would 
provide opportunities to observe very rare and population-specific 
variation, as well as increase power to explore population differences 
in gene constraint. In particular, current reference databases including 
gnomAD havea near-complete absence of representation from the Mid- 
dle East, central and southeast Asia, Oceania, and the vast majority of 
the African continent“, and these gaps must be addressed if we are to 
fully understand the distribution and effect of human genetic variation. 

It is also important to understand the practical and evolutionary 
interpretation of pLoF constraint. In particular, it should be noted that 
these metrics primarily identify genes undergoing selection against 
heterozygous variation, rather than strong constraint against homozy- 
gous variation®. In addition, the power of the LOEUF metric is affected 
by gene length, with approximately 30% of the coding genes in the 
genome still insufficiently powered for detection of constraint even 
at the scale of gnomAD (Fig. 2d). Substantially larger sample sizes and 
careful analysis of individuals enriched for homozygous pLoFs (see 
below) will be useful for distinguishing these possibilities. Furthermore, 
selection is largely blind to phenotypes emerging after reproductive 
age, and thus genes with phenotypes that manifest later in life, even if 


severe or fatal, may exhibit much weaker intolerance to inactivation. 
Despite these caveats, our results demonstrate that pLoF constraint 
divides protein-coding genes in a way that correlates usefully with 
their probability of disease impact and other biological properties, 
and confirm the value of constraint in prioritizing candidate genes in 
studies of both rare and common diseases. 

Examples such as PCSK9 demonstrate the value of human pLoF vari- 
ants for identifying and validating targets for therapeutic intervention 
across a wide range of human diseases. As discussed in more detailinan 
accompanying manuscript”, careful attention must be paid toa variety 
of complicating factors when using pLoF constraint to assess candidates. 
More valuable information comes from directly exploring the pheno- 
typic effect of LoF variants on carrier humans, both through ‘forward 
genetics’ approaches suchas gene mapping to identify genes that cause 
Mendelian disease, as well as ‘reverse genetics’ approaches that leverage 
large collections of sequenced humans to find and clinically characterize 
individuals with disruptive mutations in specific genes. Although clinical 
data are currently available for only asmall subset of gnomAD individuals, 
future efforts that integrate sequencing and deep phenotyping of large 
biobanks will provide valuable insight into the biological implications 
of partial disruption of specific genes. This is illustrated ina companion 
manuscript that explores the clinical correlates of heterozygous pLoF 
variants in the LRRK2 gene, demonstrating that life-long partial inactiva- 
tion of this gene is likely to be safe in humans”. 

Such examples, and the sheer scale of pLoF discovery in this dataset, 
suggest the near-future feasibility and considerable value of ahuman 
‘knockout’ project—a systematic attempt to discover the phenotypic 
consequences of functionally disruptive mutations, in either the het- 
erozygous or homozygous state, for all human protein-coding genes. 
Such an approach will require cohorts of samples from millions of 
sequenced and deeply, consistently phenotyped individuals and, for 
the discovery of ‘complete’ knockouts, would benefit substantially from 
the targeted inclusion of large numbers of samples from populations 
that have either experienced strong demographic bottlenecks or high 
levels of recent parental relatedness (consanguinity)”. Sucha resource 
would allow the construction of a comprehensive map that directly 
links gene-disrupting variation to human biology. 
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Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 
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The gnomAD 2.1.1 dataset is available for download at http://gnomad. 
broadinstitute.org, where we have developed a browser for the dataset 
and provide files with detailed frequency and annotation informa- 
tion for each variant. There are no restrictions on the aggregate data 
released. 


Code availability 


All code to perform quality control is provided at https://github.com/ 
broadinstitute/gnomad_qc, and the code to perform all analyses and 
regenerate all the figures in this manuscript is provided at https:// 
github.com/macarthur-lab/gnomad lof. LOFTEE is available at https:// 
github.com/konradjk/loftee. All code and software to reproduce figures 
are available in a Docker image at konradjk/gnomad_lof_paper:0.2. 
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Extended Data Fig. 1| Overview of the sample quality control workflow. 

a, Exome (square) and genome (circle) samples underwent quality controlin 
the following stages: hard filtering (step 1), relatedness inference (step 2), 
ancestry inference (step 3), platform inference (step 4, for exomes only), and 
population- and platform-specific outlier filtering (step 5). See Supplementary 
Information for further details. Except for samples failing hard filters (dotted 
outline), all quality control analyses were applied to all samples, regardless of 
the presence or absence of other quality control flags (suchas relatedness, lack 
of release permissions, or outlier status; red diagonal bar). Assignment of 
ancestry labels is represented by fill colour and accompanying three-letter 
ancestry group abbreviation. Assignment of platform labels is represented by 
outline colour and anumbered label for exomes (corresponding to imputed 
platforms) anda PCR + label for genomes. The final set of samples included in 
the gnomAD release (125,748 exomes and 15,708 genomes) was defined to be 
the set of unrelated samples with release permissions, no hard filter flags, and 
no population- and platform-specific outlier metrics (step 6). b, Inexomes, the 


chromosomal sex of samples was inferred based on the inbreeding coefficient 
onchromosome X and the coverage of chromosome Y into male (green), female 
(amber), ambiguous sex (pink), and sex chromosome aneuploid (blue). c, The 
top two principal components from PCA-HDBSCAN analysis of exome capture 
regions. Sequencing platforms were inferred for exome samples based on 
principal component analysis of biallelic variant call rates over all known 
exome capture regions, and samples were assigned a cluster label (0-15, or 
unknown) using HDBSCAN. d, We performed platform- and 
population-specific outlier filtering for several quality-control metrics. The 
distribution of the number of deletions in samples from south Asian individuals 
across platforms is shown. Distributions (and accordingly, median and median 
absolute deviations) for these metrics varied widely both by population and 
sequencing platform (numbered onthe yaxis). Outliers (black dots) were 
defined as samples with values outside four median absolute deviations 
(shown by dotted vertical lines) from the median of a given metric. 
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Extended Data Fig. 4| Variant discovery at large sample sizes. a, b, The total 
number of variants observed (a) and the proportion of possible variants 
observed (b) asa function of sample size, broken down by variant class. At large 
sample sizes, CpG transitions become saturated, as previously described’. 
Colours are consistent inaandb.c, This results ina decrease of the transition/ 
transversion (Ti/Tv) ratio. d, When broken down by functional class, we 
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observe the effects of selection, in which synonymous variants have the 
highest proportion observed, followed by missense and pLoF variants. e, f, The 
number of additional pLoF variants introduced into the cohort as a function of 
sample size ona log (e) and linear (f) scale. Here, gnomAD (black) referstoa 
uniform sampling from the population distribution of the full cohort of 
exome-sequenced individuals. 
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Extended Data Fig. 5 | Using LOFTEE tocreate a high-confidence set of pLoF 
variation. a, Schematic of LOFTEE filters. LOFTEE filters out putative 
stop-gained, essential splice, and frameshift variants based on sequence and 
transcript context, as wellas flagging exonic features such as conservation (not 
shown). For instance, variants that are not predicted to disrupt splicing based 
onretention ofa strong splice site, or rescue of a nearby splice site. Additional 
filters not showninclude: ANC_ALLELE (the alternative allele is the ancestral 
allele), NON_ACCEPTOR_DISRUPTING and DONOR_RESCUE (opposite to those 
already shown). b, Totune the END_TRUNC filter, we retained variants that pass 
the 50-bp rule (are more than 50 bp before the 3’-most splice site). The overall 
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MAPS score for variants that fail this rule is shown in grey. For the remaining 
39,072 variants, we computed the sum of the genomic evolutionary rate 
profiling (GERP) score of bases deleted by the variant. At 40 bins of this score, 
we compute the MAPS score for those variants retained at this threshold (red) 
compared to variants removed at this threshold (blue), and plot thisasa 
function of the proportion of variants filtered at this threshold. We chose the 
50% point as it retains variants with a MAPS score of 0.14, while removing 
variants with a MAPS score of 0.06. Error bars represent 95% confidence 
intervals. c, Density plot of aggregate pLoF frequency computed from 
high-confidence pLoF variants discovered using LOFTEE. 
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Extended Data Fig. 6| See next page for caption. 


Extended Data Fig. 6 | Computing the depletion of variation of functional 
categories. a, The distribution of mean methylation values across 37 tissues 
and across every CpG dinucleotide in the genome. We divided the genome into 
3 levels (low methylation, missing or < 0.2; medium, 0.2-0.6; and high, >0.6) 
and computed all ensuing metrics based on these categories. b, Comparison of 
estimates of the mutation rate with previous estimates”. For transversions and 
non-CpG transitions, we observe a strong correlation (linear regression 
r=0.98; P=2.6 x10). For CpG transitions, the new estimates are calculated 
separately for the three levels of methylation and track with these levels. 
Colours and shapes are consistent in b-d.c, For c-e, only synonymous variants 
are considered. The proportion of possible variants observed for each context 
is correlated with the mutation rate. We compute twofit lines, one for CpG 


transitions, and one for other contexts to calibrate our estimates. 

d, Calibration of each context to compute a predicted proportion observed 
after fitting the two modelsinc, whichis used to calculate an expected number 
of variants at high coverage. e, With an expectation computed from high 
coverage regions, the observed/expected ratio follows alogarithmic trend 
with the median coverage below 40x, whichis used to correct low coverage 
bases in the final expectation model. f-h, For each transcript, the observed 
number of variants is plotted against the expected number from the model 
described above, for synonymous (f), missense (g), and pLoF (h) variants, and 
the linear regression coefficient is shown. Note that the expectation does not 
include selection, and so, pLoF and, toa lesser extent, missense variants exhibit 
lower observed values than expected. 
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Extended Data Fig. 7 | Genomic properties of constrained genes. 

a, b, Histogram of the observed/expected ratio of pLoF variation (a) and LOEUF 
(b). Most genes have fewer observed variants than expected (median observed/ 
expected = 0.48), and the genes with no observed pLoFs are distinguished 
between confidently constrained genes and noise by LOEUF.c, A 2D density 
plot of the number of observed versus expected pLoF variants. The boundaries 
of each decile are plotted as gradients (that is, the most constrained decile is 
below the lowest red line). d, The LOEUF of a gene is correlated with its coding 
sequence length (beta=-1.07 x 10+; P< 10): thus, for alldownstream 
statistical tests, we adjust for gene length or remove genes with fewer than 10 
expected pLoFs. e, Observed/expected ratios of various functional classes 
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across genes within each LOEUF decile. The most constrained decile has 
approximately 6% of the expected pLoFs, while synonymous variants are not 
depleted and missense variants exhibit modest depletion. f, The percentage of 
each LOEUF decile that was described in ExAC as constrained, or pLI>0.9*. 

g, The percentage of each LOEUF decile that have at least one homozygous 
pLoF variant. h, Box plots of the aggregate pLoF frequency for each LOEUF 
decile. Centre line denotes the median; box limits denote upper and lower 
quartiles; whiskers denote 1.5x the interquartile range; points denote outliers). 
Ine-g, error bars represent 95% confidence intervals (note that insome cases 
these are fully contained within the plotted point). 
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transcript-based LOEUF decile, is shown for all transcripts and canonical population size. e, The mean LOEUF score for 865 genes with expected pLoF 
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permuted set (blue). d, For 927 genes with expected pLoF =10 in both the 
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Extended Data Fig. 9 | Applications of constraint metrics to rare variant 
analysis of disease. a, Proportion of each LOEUF decile found in OMIM. 

b, Proportion of disease-associated genes discovered by whole-exome/ 
genome sequencing (WES/WGS) compared to conventional (typically 
linkage) methods, plotted by LOEUF decile. The former are more constrained 
(LOEUF 0.674 versus 0.806, two-sided t-test P=1.2 x 10"), which suggests that 
these techniques are more effective for picking up genes with a de novo 
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mechanism of disease, compared to recessive genes identified by linkage 
methods. c, Similar to Fig. 5a, the rate ratio is defined by the rate of denovo 
variants (number per patient) in autism cases divided by the rate in controls. 
pLoF variants in the most constrained decile of the genome are approximately 
fourfold more likely to be found in cases compared to controls. d, The mean 
odds ratio of a logistic regression of schizophrenia”* is plotted for each LOEUF 
decile. Error bars ina-d correspond to 95% confidence intervals. 
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Extended Data Fig. 10| Applications of constraint metrics tocommon 
variant analysis of disease. a, The t* coefficient (see Supplementary 
Information) for each LOEUF decile across 276 independent traits. Unlike the 
enrichment measure reported in Fig. 5, tis adjusted for 74 baseline genomics 
annotations. Positive values of t* indicate greater per-SNP heritability than 
would be expected based on the other annotations in the baseline model, 
whereas negative values indicate depleted per-SNP heritability compared to 
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that baseline expectation. b, Enrichment coefficient for each LOEUF decile 
using different window sizes to define which SNPs to include upstream and 
downstream of each gene. c, Enrichment coefficient for each LOEUF decile 
across traits after controlling for brain expression and gene size. Results are 
consistent with those shown in Fig. 5, which indicates that brain gene 
expression and gene size do not fully explain the enrichment of heritability 
observed in constrained genes. Error bars represent 95% confidence intervals. 
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For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section. 


n/a | Confirmed 


The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


— The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


A description of all covariates tested 


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 
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Life sciences study design 
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Sample size This study was opportunistic, and involved secondary use of all available genome and exome data. No sample size was predetermined. 
Nevertheless, the current sample size enables the accurate assessment of constraint against pLoF variation for over 72% of genes in the 
human genome (see Figure 2). 


Data exclusions Sample QC and variant QC for gnomAD are described extensively in the supplementary methods. Notably, individuals with severe pediatric 
disease, and known first disease relatives of those with severe pediatric disease were excluded, as previously established and described [Lek 
et al., 2016]. 


Replication We did not attempt to reproduce any findings in a separate dataset, as no other data set of comparable size exists. 


Randomization _ As this was a population-based study, and not a case-control study, no randomization was performed. 


Blinding As this was a population-based study, and not a case-control study, blinding was not relevant. 
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Population characteristics As an opportunistic collection of data, the participants in this study were not selected based on age, gender, or genotypic 
information. As described above, individuals with severe pediatric disease, and known first disease relatives of those with severe 
pediatric disease were excluded. The populations are provided in Supplementary Table 7, and there are 64,754 females and 
76,702 males. These data were obtained primarily from case-control studies of adult-onset common diseases, including 
cardiovascular disease, type 2 diabetes, and psychiatric disorders. 


Recruitment As this was an opportunistic secondary use study, we did not recruit any participants. 


Ethics oversight This study was overseen by the Broad Institute’s Office of Research Subject Protection and the Partners Human Research 
Committee, and was given a determination of Not Human Subjects Research. 
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Structural variants (SVs) rearrange large segments of DNA‘and can have profound 
consequences in evolution and human disease?’ As national biobanks, 
disease-association studies, and clinical genetic testing have grown increasingly 
reliant on genome sequencing, population references such as the Genome 
Aggregation Database (gnomAD)* have become integral in the interpretation of 
single-nucleotide variants (SNVs)°. However, there are no reference maps of SVs from 
high-coverage genome sequencing comparable to those for SNVs. Here we present a 
reference of sequence-resolved SVs constructed from 14,891 genomes across diverse 


global populations (54% non-European) in gnomAD. We discovered a rich and 
complex landscape of 433,371 SVs, from which we estimate that SVs are responsible 
for 25-29% of all rare protein-truncating events per genome. We found strong 
correlations between natural selection against damaging SNVs and rare SVs that 
disrupt or duplicate protein-coding sequence, which suggests that genes that are 
highly intolerant to loss-of-function are also sensitive to increased dosage’. We also 
uncovered modest selection against noncoding SVs in cis-regulatory elements, 
although selection against protein-truncating SVs was stronger than all noncoding 
effects. Finally, we identified very large (over one megabase), rare SVs in 3.9% of 
samples, and estimate that 0.13% of individuals may carry an SV that meets the 
existing criteria for clinically important incidental findings’. This SV resource is freely 
distributed via the gnomAD browser® and will have broad utility in population 
genetics, disease-association studies, and diagnostic screening. 


SVs are DNA rearrangements that involve at least 50 nucleotides'. 
By virtue of their size and abundance, SVs represent an important 
mutational force that shape genome evolution and function?’, and 
contribute to germline and somatic diseases” “. The profound effect 
of SVs is also attributable to the numerous mechanisms by which they 
can disrupt protein-coding genes and cis-regulatory architecture”. 
SVs can be grouped into mutational classes that include ‘unbalanced’ 
gains or losses of DNA (for example, copy-number variants, CNVs), 
and ‘balanced’ rearrangements that occur without corresponding 
dosage alterations (such as inversions and translocations)! (Fig. 1a). 
Other common forms of SVs include mobile elements that insert them- 
selves throughout the genome, and multiallelic CNVs (MCNVs) that can 


exist at high copy numbers". More recently, exotic species of complex 
SVs have been discovered that involve two or more distinct SV signa- 
tures in a single mutational event interleaved on the same allele, and 
can range from CNV-flanked inversions to rare instances of localized 
chromosome shattering, such as chromothripsis”*. The diversity of 
SVs in humans is therefore far greater than has been widely appreciated, 
as is their influence on genome structure and function. 

Although SVs alter more nucleotides per genome than SNVs and 
short insertion/deletion variants (indels; <50 bp)’, surprisingly little 
is known about their mutational spectra ona global scale. The largest 
published population study of SVs using whole-genome sequencing 
(WGS) remains the 1000 Genomes Project (n = 2,504; 7x sequence 


Lists of affiliations and consortium members appear at the end of the paper. 
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Fig. 1| Properties of SVs across human populations. a, SV classes catalogued 
in this study. We also documented unresolved non-reference ‘breakends’ 
(BNDs), but they were excluded from all analyses as low-quality variants. 

b, After quality control, we analysed 14,237 samples across continental 
populations, including African/African American (AFR), Latino (AMR), East 
Asian (EAS), and European (EUR), or other populations (OTH). Three publicly 
available WGS-based SV datasets are provided for comparison (1000 Genomes 
Project (1000G), approximately 7x coverage; Genome of the Netherlands 
Project (GoNL), around 13x coverage; Genotype-Tissue Expression Project 
(GTEx), approximately 50x coverage)!"*"”. c, We discovered 433,371 SVs, and 
provide counts from previous studies for comparison"*”, d, A principal 
component (PC) analysis of genotypes for 15,395 common SVs separated 
samples along axes corresponding to genetic ancestry. e, The median genome 
contained 7,439 SVs. f, Most SVs were small. Expected Alu, SVA and LINE1 
mobile element insertion peaks are marked at approximately 300 bp, 2.1kb and 
6kb, respectively. g, Most SVs were rare (allele frequency (AF) <1%), and 49.8% 
of SVs were singletons (solid bars). h, Allele frequencies were inversely 
correlated with SV size across all 335,470 resolved SVs in unrelated individuals. 
Values are mean and 95% confidence interval from 100-fold bootstrapping. 
Colour codes are consistent betweena, c,e-h, and between bandd. 


coverage)', and the substantial technical challenges of SV discovery 
from WGS* has led to non-uniform SV analyses across contemporary 
studies’ ?°, Moreover, short-read WGS is unable to capture a sub- 
set of SVs accessible to more expensive niche technologies, such as 
long-read WGS”!. Owing to the combination of these challenges, SV 
references are dwarfed by contemporary resources for short variants, 
suchas the Exome Aggregation Consortium (ExAC) and its successor, 
the Genome Aggregation Database (gnomAD), which have jointly ana- 
lysed more than 140,000 individuals**. Publicly available resources 
suchas ExAC and gnomAD have transformed many aspects of human 
genetics, including defining sets of genes constrained against dam- 
aging coding mutations‘ and providing frequency filters for variant 
interpretation®. As short-read WGS is rapidly becoming the predomi- 
nant technology in large-scale human disease studies, and will prob- 
ably displace conventional methods for diagnostic screening, there 
is amounting need for comparable references of SVs across global 
populations. 

Inthis study, we developed gnomAD-SV, a sequence-resolved refer- 
ence for SVs from 14,891 genomes. Our analyses revealed diverse muta- 
tional patterns among SVs, and principles of selection acting against 
reciprocal dosage changes in genes and noncoding cis-regulatory 


elements. From these analyses, we determined that SVs represent more 
than 25% of all rare protein-truncating events per genome, emphasizing 
the unrealized potential of routine SV detection in WGS studies. This 
SV reference has been integrated into the gnomAD browser (http:// 
gnomad.broadinstitute.org) with no restrictions on reuse so that it 
can be mined for new insights into genome biology and applied as a 
resource to interpret SVs in diagnostic screening. 


SV discovery and genotyping 


We analysed WGS data for 14,891 samples (average coverage of 32x) 
aggregated from large-scale sequencing projects, of which 14,237 
(95.6%) passed all quality thresholds, representing a general adult popu- 
lation depleted for severe Mendelian diseases (median age of 49 years) 
(Supplementary Table 1, Supplementary Figs. 1,2). This cohortincluded 
46.1% European, 34.9% African or African American, 9.2% East Asian, 
and 8.7% Latino samples, as well as 1.2% samples from admixed or other 
populations (Fig. 1). Following family-based analyses using 970 parent- 
child trios for quality assessments, we pruned all first-degree relatives 
from the cohort, retaining 12,653 unrelated genomes for subsequent 
analyses. 

We discovered and genotyped SVs using a cloud-based, 
multi-algorithm pipeline for short-read WGS (Supplementary Fig. 3), 
which we prototyped in a study of 519 autism quartet families”°. This 
pipeline integrated four orthogonal evidence types to capture SVs 
across the size and allele frequency spectra, including six classes of 
canonical SVs (Fig. 1a) and 11 subclasses of complex SVs” (Fig. 2). We 
augmented this pipeline with new methods to account for the technical 
heterogeneity of aggregated datasets (Extended Data Fig. 1, Supple- 
mentary Figs. 4,5), and discovered 433,371 SVs (Fig. 1c). After exclud- 
ing low-quality SVs, which were predominantly (61.6%) composed of 
incompletely resolved breakpoint junctions (that is, ‘breakends’) that 
lack interpretable alternative allele structures for functional annota- 
tion and produce high false-discovery rates”° (Extended Data Fig. 2a), 
we retained 335,470 high-quality SVs for subsequent analyses (Sup- 
plementary Table 3). This final set of high-quality SVs corresponded 
toa median of 7,439 SVs per genome, or more than twice the number 
of variants per genome captured by previous WGS-based SV stud- 
ies such as the 1000 Genomes Project (3,441 SVs per genome from 
approximately 7x coverage WGS), which underscores the benefits of 
high-coverage WGS and improved multi-algorithm ensemble methods 
for SV discovery. 

Given that there are no gold-standard benchmarking procedures 
for SVs from WGS, we evaluated the technical qualities of gnomAD-SV 
using seven orthogonal approaches. These analyses are described in 
detail in Extended Data Figs. 2, 3, Supplementary Figs. 6-12, Supple- 
mentary Table 4 and Supplementary Note 1, but we highlight just a few 
hereto demonstrate that gnomAD-SV conforms to many fundamental 
principles of population genetics, including Mendelian segregation, 
genotype distributions, and linkage disequilibrium. We found that the 
precision of gnomAD-SV was comparable to our previous study of 519 
autism quartets that attained a 97% molecular validation rate for all 
de novo SV predictions”°: in gnomAD, analyses of 970 parent-child 
trios indicated a median Mendelian violation rate of 3.8% and a het- 
erozygous de novo rate of 3.0%. We also observed that 86% of SVs were 
in Hardy-Weinberg equilibrium, and common SVs were in strong linkage 
disequilibrium with nearby SNVs or indels (median peak R?= 0.85). We 
performed extensive in silico confirmation of 19,316 SVs predicted from 
short-read WGS using matched long-read WGS from four samples””?, 
finding a 94.0% confirmation rate with breakpoint-level read evidence, 
and revealing that 59.8% of breakpoint coordinates were accurate within 
asingle nucleotide of the long-read data. These and other benchmark- 
ing approaches suggested that gnomAD-SV was sufficiently sensitive 
and specific to be used as a reference dataset for most applications in 
human genomics. 
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Fig. 2 |Complex SVs are abundant in the human genome. We resolved 5,295 
complex SVs across 11 mutational subclasses, 73.7% of which involved at least 
one inversion. Each subclass is detailed here, including their mutational 
signatures, structures, abundance, density of SV sizes (vertical line indicates 


Population genetics and genome biology 


The distribution of SVs across samples matched expectations based 
on human demographic history, with the top three components of 
genetic variance separating continental populations (Fig. 1d, Sup- 
plementary Fig. 13). African and African American samples exhibited 
the greatest genetic diversity and their common SVs were in weaker 
linkage disequilibrium with nearby short variants than Europeans, 
whereas East Asians featured the highest levels of homozygosity 
(Fig. le, Extended Data Fig. 4a—d, Supplementary Fig. 7). The muta- 
tional diversity of gnomAD-SV was extensive: we completely resolved 
5,295 complex SVs across 11 mutational subclasses, of which 3,901 
(73.7%) involved inverted segments (Fig. 2), confirming that inversion 
variation is predominantly composed of complex SVs rather than 
canonical inversions’. Across all SV classes, most SVs were small 
(median size of 331 bp) and rare (allele frequency < 1%; 92% of SVs), 
with half of all SVs (49.8%) appearing as ‘singletons’ (that is, only one 
allele observed across all samples) (Fig. If, g). Although the proportion 
of singletons varied by SV class, it was strongly dependent on SV size 
across all classes, which suggests that the amount of DNA rearranged 
is a key determinant of selection against most SVs (Fig. lh, Extended 
Data Fig. 5a). 

Mutation rate estimates for SVs have remained elusive owing to 
limited sample sizes, poor resolution of conventional technologies, 
technical challenges of SV discovery, and use of cell line-derived DNA 
in population studies’. Here, we used the Watterson estimator” to 
project amean mutation rate of 0.29 de novo SVs (95% confidence inter- 
val 0.13-0.44) per generation in regions of the genome accessible to 
short-read WGS, or roughly onenewSVevery 2-8 live births, with muta- 
tion rates varying markedly by SV class (Fig. 3a). Although this imperfect 
method extrapolates from data pooled across unrelated individuals, we 
previously demonstrated comparable rates from molecularly validated 
observations in 519 quartet families”°. Like mutation rates, the distri- 
bution of SVs throughout the genome was non-uniform, significantly 
correlated with repetitive sequence contexts, and was enriched near 
centromeres and telomeres” (Supplementary Fig. 16). These trends 
were dependent on SV class, as biallelic deletions and duplications were 
predominantly enriched at telomeres, whereas MCNVs were enriched 
in centromeric segmental duplications (Fig. 3b-d). Given the reduced 
sensitivity of short-read WGS in repetitive sequences, this study cer- 
tainly underestimates the true SV mutation rates; nevertheless, these 
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median size), and allele frequencies. Five pairs of subclasses have been 
collapsed into single rows due to mirrored or similar alternative allele 
structures (for example, delINV versus INVdel). Two complex SVs did not 
conform to any subclass (Extended Data Fig. 8). 


analyses implicate several aspects of chromosomal context and SV class 
in determining SV mutation rates throughout the genome. 


Dosage sensitivity of coding and noncoding loci 

Owing to their size and mutational diversity, SVs can have varied con- 
sequences on protein-coding genes” (Fig. 4a, Supplementary Fig. 17). 
In principle, any SV can result in predicted loss-of-function (pLoF), 
either by deleting coding nucleotides or altering open-reading frames. 
Coding duplications can result in copy-gain of entire genes, or of a 
subset of exons within a gene (referred to here as intragenic exonic 
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Fig. 3 |Genome-wide mutational patterns of SVs. a, Mutation rates (4) from 
the Watterson estimator for each SV class”. Bars represent 95% confidence 
intervals. Rates of molecularly validated de novo SVs from 519 quartet families 
are provided for comparison”’. b, Smoothed enrichment of SVs per 100-kb 
windowacross the average of all autosomes normalized by chromosomearm 
length (a ‘meta-chromosome’) (Supplementary Fig. 16).c, The distribution of 
SVs along the meta-chromosome was dependent on variant class. d, SV 
enrichment by class and chromosomal position provided as mean and 95% 
confidence intervals (CI). C, centromeric; I, interstitial; T, telomeric. Pvalues 
were computed using atwo-sided f-test and were Bonferroni-adjusted for 

21 comparisons. *P<2.38 x 10°. 


duplication, or IED). The average genome in gnomAD-SV containeda 
mean of 179.8 genes altered by biallelic SVs (144.3 pLoF, 24.3 copy-gain, 
and 11.2 IED), of which 11.6 were predicted to be completely inacti- 
vated by homozygous pLoF (Fig. 4b, Extended Data Fig. 4e-h). When 
restricted to rare (allele frequency < 1%) SVs, we observed a mean of 
10.2 altered genes per genome (5.5 pLoF, 3.4 copy-gain, and 1.3 IED). 
By comparison, a companion gnomAD paper estimated 122.4 pLoF 
short variants per genome, of which 16.3 were rare*. These analyses 
suggest that 29.4% of rare heterozygous gene inactivation events per 
individual are contributed by SVs, or conservatively 25.2% of pLoF 
events if we exclude IEDs given the context-dependence of their 
functional impact. 

A fundamental question in human genetics is the degree to which 
natural selection acts on coding and noncoding loci. The proportion 
of singleton variants has been established as a proxy for strength of 
selection®; however, this metric is confounded for SVs given the strong 
correlation between allele frequency and SV size, among other factors. 
Therefore, we developed anew metric, adjusted proportion of single- 
tons (APS), to account for SV class, size, genomic context, and other 
technical covariates (Extended Data Fig. 5, Supplementary Fig. 14). 
Under this normalized APS metric, a value of zero corresponds to asin- 
gleton proportion comparable to intergenic SVs, whereas values greater 
than zero reflect purifying selection, similar to the ‘mutability-adjusted 
proportion of singletons’ (MAPS) metric used for SNVs°. Applying this 
APS model revealed signals of pervasive selection against nearly all 
classes of SVs that overlap genes, including intronic SVs, whole-gene 
inversions, SVs in gene promoters, and deletions as small as a single 
exon (Fig. 4c, Extended Data Fig. 6, Supplementary Fig. 18). The one 
notable exception was copy-gain duplications, which showed no clear 
evidence of selection beyond what could already be explained by their 
sizes, which were vastly larger than non-copy-gain duplications (median 
copy-gain duplication size = 134.8 kb; median non-copy-gain duplica- 
tion size = 2.7 kb; one-tailed Wilcoxon test, W=1.18 x 108, P<10™°). This 
result could have numerous explanations, but it is consistent with the 
known diverse evolutionary roles of gene duplication events, including 
positive selection reported in humans”””’. 

Methods that quantify evolutionary constraint on a per-gene basis, 
suchas the probability of intolerance to heterozygous pLoF variation 
(pLI)° and the pLoF observed/expected upper fraction (LOEUF)*, have 
become core resources in human genetics. Nearly all existing metrics, 
including pLI and LOEUF, are derived from SNVs. Although previous 
studies have attempted to compute similar scores using large CNVs 
detected by microarray and exome sequencing”, or to correlate 
deletions with pLI’, no gene-level metrics comparable to LOEUF exist 
for SVs at WGS resolution. To gain insight into this problem, we built 
a model to estimate the depletion of rare SVs per gene compared to 
expectations based on gene length, genomic context, and the structure 
of exons and introns. This model is imperfect, as current sample sizes 
are too sparse to derive precise gene-level metrics of constraint from 
SVs. Nevertheless, we found strong concordance between the deple- 
tion of rare pLoF SVs and existing pLoF and missense SNV constraint 
metrics‘ (pLoF Spearman correlation test, p= 0.90, P<10) (Fig. 4d, 
Supplementary Fig. 19). Notably, acomparable positive correlation was 
also observed for copy-gain SVs and SNV constraint (pLoF Spearman 
correlation test, p = 0.78, P< 10°), whereas a weaker yet significant 
correlation was detected for IEDs (pLoF Spearman correlation test, 
p=0.58, P=2.0 x10). As orthogonal support for these trends, we 
identified an inverse correlation between APS and SNV constraint 
across all functional categories of SVs, which was consistent with 
our observed depletion of rare, functional SVs in constrained genes 
(Extended Data Fig. 6f). These comparisons confirm that selection 
against most classes of gene-altering SVs mirrors patterns observed 
for short variants'®*°. They further suggest that SNV-derived constraint 
metrics such as LOEUF capture a general correspondence between 
haploinsufficiency and triplosensitivity for a large fraction of genes in 
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Fig. 4| Pervasive selection against SVs in genes mirrors coding short 
variants. a, Four categories of gene-overlapping SVs, with counts of total SVs, 
median SV size, and mean SVs per gene in gnomAD-SV. b, Count of genes 
altered by SVs per genome. Horizontal lines indicate medians. Sample sizes per 
category listed in Supplementary Table 9. c, APS value for SVs overlapping 
genes. Bars indicate 100-fold bootstrapped 95% confidence intervals. SVs per 
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pLoF SNVs versus gene-overlapping SVs in 100 bins of around 175 genes each, 
ranked by SNV constraint*. Correlations were assessed witha two-sided 
Spearman correlation test. Solid lines represent 21-point rolling means. 

See Supplementary Fig. 19 for comparisons to missense constraint. 


the genome. It therefore appears that the most highly pLoF-constrained 
genes not only are sensitive to pLoF, but also are more likely to be intol- 
erant to increased dosage and other functional alterations. 

Incontrast to the well-studied effects of coding variation, the effects 
of noncoding SVs on regulatory elements are largely unknown. There are 
a handful of examples of SVs with strong noncoding effects, although 
they are scarce in humans and model organisms”. In gnomAD-SV, 
we explored noncoding dosage sensitivity across 14 regulatory ele- 
ment classes, ranging from high-confidence experimentally validated 
enhancers to large databases of computationally predicted elements 
(Supplementary Table 5). We found that noncoding CNVs overlapping 
most element classes had increased proportions of singletons, although 
none exceeded the APS observed for pLoF SVs (Fig. 5a). In general, the 
effects of noncoding deletions appeared stronger than noncoding 
duplications, and CNVs predicted to delete or duplicate entire ele- 
ments were under stronger selection than partial element disruption 
(Fig. 5b). We also observed that primary sequence conservation was 
correlated with selection against noncoding CNVs (Fig. 5c, d), which 
provides a foothold for future work on interpretation and functional 
effect prediction for noncoding SVs. Broadly, these results followed 
trends we observed for protein-coding SVs, which we interpreted as 
evidence for weak but widespread selection against CNVs altering 
most classes of annotated regulatory elements. 


Trait association and clinical genetics 


Most large-scale trait association studies have only considered SNVs 
in genome-wide association studies (GWAS). Taking advantage of 
the sample size and resolution of gnomAD-SV, we evaluated whether 
SNVs associated with human traits might be in linkage disequilib- 
rium with SVs not directly genotyped in GWAS. We identified 15,634 
common SVs (allele frequency >1%) in strong linkage disequilibrium 
(R? > 0.8) with at least one common short variant (Supplementary 
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bootstrapping. Each category was compared to neutral variation (APS = 0) 
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Fig. 7), 14.8% of which matched a reported association from the 
NHGRI-EBI GWAS catalogue or arecent analysis of 4,203 phenotypes 
in the UK Biobank****. Common SVs in linkage disequilibrium with 
GWAS variants were enriched for genic SVs across multiple functional 
categories (Supplementary Table 6), and included candidate SVs such 
as a deletion of a thyroid enhancer in the first intron of ATP6VOD1 
at a hypothyroidism-associated locus* (Extended Data Fig. 7). We 
also identified matches for previously proposed causal SVs tagged 
by common SNVs, including pLoF deletions of CFHR3 or CFHR1 in 
nephropathies and of LCE3B or LCE3C in psoriasis®”°. These results 
demonstrate the value of imputing SVs into GWAS, and for the eventual 
unification of short variants and SVs in all trait association studies. 
Given the potential value of this resource, we have released these link- 
age disequilibrium maps in Supplementary Table 7. 

As genomic medicine advances towards diagnostic screening at 
sequence resolution, computational methods for variant discovery 
from WGS and population references for interpretation will become 
indispensable. One category of disease-associated SVs, recurrent CNVs 
mediated by homologous segmental duplications known as genomic 
disorders, are particularly important because they collectively repre- 
sent acommon cause of developmental disorders”. Accurate detection 
of large, repeat-mediated CNVs is thus crucial for WGS-based diagnostic 
testing as chromosomal microarray is the recommended first-tier diag- 
nostic screen at present for unexplained developmental disorders”. 
Using gnomAD-SV, we evaluated our ability to detect genomic disorders 
in WGS data by calculating CNV carrier frequencies for 49 genomic 
disorders across 10,047 unrelated samples with no known neuropsy- 
chiatric disease and found that CNV carrier frequencies in gnomAD-SV 
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were consistent with those reported from chromosomal microarray in 
the UK Biobank”® (R? = 0.669; Pearson correlation test, P=7.38 x 10°) 
(Fig. 6a, Supplementary Table 8, Supplementary Fig. 20). The frequen- 
cies of carriers of genomic disorders did not vary significantly among 
populations, with the exception of duplications of NPHP1 at 2q13, in 
which carrier frequencies in East Asian samples were up to 4.6-fold 
higher than in other populations, further highlighting the potential 
for variant interpretation to be confounded by the limited diversity 
of existing SV references (Supplementary Fig. 21). 

In the context of variant interpretation, the current gnomAD-SV 
resource will permit a screening threshold of allele frequencies less 
than 0.1% when matching on ancestry to the populations sampled 
here, and allele frequencies less than 0.004% globally. In the current 
release, we catalogued at least one pLoF or copy-gain variant for 36.9% 
and 23.7% of all autosomal genes, respectively, and 490 genes with at 
least one homozygous pLoF SV (Fig. 6b, Extended Data Fig. 6e, Sup- 
plementary Fig. 22). We also benchmarked carrier rates for several 
categories of clinically relevant variants in gnomAD-SV. First, 0.32% 
of samples carried a very rare (allele frequency < 0.1%) SV resulting in 
pLoF ofa gene for which incidental findings are clinically actionable, 
nearly half of which (that is, 0.13% of all samples) would meet diagnos- 
tic criteria as pathogenic or likely pathogenic based upon the Ameri- 
can College of Medical Genetics (ACMG) recommendations’ (Fig. 6c). 
Second, 7.22% of individuals were heterozygous carriers of rare pLoF 
SVs in known recessive developmental disorder genes”. Third, we 
estimated that 3.8% of the general population (95% confidence inter- 
val of 3.2-4.6%) carries at least one very large (>1 Mb) rare autosomal 
SV, roughly half of which (45.2%) were balanced or complex (Fig. 6d). 
Among these was an example of localized chromosome shattering 
involving at least 49 breakpoints, yet resulting in largely balanced 
products, reminiscent of chromothripsis, in an adult with no known 
severe disease or DNA repair defect®"*” (Fig. 6e, Extended Data Fig. 8). 
Collectively, these analyses highlight the potential of gznomAD-SV 
and WGS-based SV methods to augment disease-association studies 
and clinical interpretation across a broad spectrum of variant classes 
and study designs. 


Discussion 


Human genetic research and clinical diagnostics are becoming increas- 
ingly invested in capturing the complete landscape of variation in 
individual genomes. Ambitious international initiatives to generate 
short-read WGS in many thousands of individuals from common disease 
cohorts have underwritten this goal*°, and millions of genomes will 
be sequenced in the coming years from national biobanks*”’. A central 
challenge to these efforts will be the uniform analysis and interpretation 
of all variation accessible to WGS, particularly SVs, which are frequently 
invoked as asource of added value offered by WGS. Indeed, early WGS 
studies in cardiovascular disease and autism have been largely consist- 
ent in their analyses of short variants, but every study has differed in its 
analysis of SVs!8 7°40, Thus, while ExAC and gnomAD have prompted 
remarkable advances in medical and population genetics for short 
variants, the same gains have not yet been realized for SVs. Although 
gnomAD-SV is not exhaustively comprehensive, it was derived from 
WGS methods anda reference genome that match those currently used 
in many research and clinical settings, which will help to facilitate the 
eventual standardization of SV discovery, analysis, and interpretation 
across studies. 

Most foundational assumptions about human genetic variation were 
consistent between SVs and short variants in gnomAD, most notably 
that SVs segregate stably on haplotypes in the population and experi- 
ence selection commensurate with their predicted biological conse- 
quences. This study also spotlights unique aspects of SVs, such as their 
remarkable mutational diversity, their varied functional effects on 
coding sequence, and the intense selection against large and complex 
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a, Comparison of carrier frequencies for 49 putatively disease-associated 
deletions (red) and duplications (blue) at genomic disorder loci between 
gnomAD-SVand microarray analyses in the UK Biobank (UKBB)**. Light bars 
indicate binomial 95% confidence intervals. Solid grey line represents linear 
best fit. b, At least one pLoF or copy-gain SV was detected in 36.9% and 23.7% of 
all autosomal genes, respectively. ‘Constrained’ and ‘unconstrained’ includes 
the least and most constrained 15% of all genes based on LOEUF*, respectively. 
c, Carrier rates for very rare (allele frequency < 0.1%) pLoF SVs in medically 
relevant genes across several gene lists’**“*. SVs per category listedin 
Supplementary Table 9. d, Carrier rates for very large (=1 Mb) rare autosomal 
SVs among 12,653 genomes. Bars represent binomial 95% confidence intervals. 
e, Acomplex SV involving at least 49 breakpoints and seven chromosomes 
(also see Extended Data Fig. 8). Teal arrows indicate insertion point into 
chromosomel. 


SVs. Our analyses also demonstrate that gene-altering effects of SVs 
beyond pLoF are remarkably similar to the mutational constraints 
of SNVs, and that SNV constraint metrics are not specific to haploin- 
sufficiency but underlie a general intolerance to alterations of both 
gene dosage and structure. Beyond genes, we uncovered widespread 
but modest selection against noncoding dosage alterations of many 
families of cis-regulatory elements. This study represents one of the 
largest empirical assessments of noncoding dosage sensitivity in 
humans, and underscores that: (1) few—if any—classes of noncoding 
cis-regulatory variants are likely to experience selection as strong as 
protein-truncating variants; (2) sequence conservation is unsurpris- 
ingly one of the strongest features associated with selection against 
noncoding SVs; and (3) current WGS sample sizes are vastly under- 
powered to identify individual constrained functional elements inthe 
noncoding genome. 

The value of the multi-algorithm ensemble approach and deep WGS 
is evident in the improved sensitivity of SV detection in gnomAD-SV. 
However, short-read WGS remains limited by comparisonto emerging 
long-read technologies”. Given that short-read WGS is blind to a dis- 
proportionate fraction of repeat-mediated SVs and small insertions by 
comparison tolong-read methods, this study certainly underestimates 
the true mutation rates within such hypermutable regions. Similarly, 
although our approach involves extensive methods to resolve complex 
SV alleles, some variants such as high-copy-state MCNVs often involve 
complicated haplotype configurations, and we expect that emerging 
de novo assembly and graph-based genome representations will greatly 
expand our knowledge of such SVs’. Nonetheless, 92.7% of all known 
autosomal protein-coding nucleotides are not localized to simple- or 
low-copy repeats, and therefore we expect that the catalogues of SVs 
accessible to short-read WGS across large populations like gnomAD-SV 
will capture a majority of the most interpretable gene-disrupting SVs 
in humans. 


The scale of short-read WGS datasets currently in production has 
magnified the need for publicly available SV resources, and gnomAD-SV 
represents an initial effort to fill this void. Although these data remain 
insufficient to derive accurate estimates of gene-level constraint, 
sequence-specific mutation rates, and intolerance to noncoding SVs, 
they provide a step towards these goals and reinforce the value of data 
sharing and harmonized analyses of aggregated genomic data sets. 
These data have been made available without restrictions on reuse 
(https://gnomad.broadinstitute.org), and this resource will catalyse 
new discoveries in basic research while providing immediate clinical 
utility for the interpretation of rare structural rearrangements across 
human populations. 
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Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


All gnomAD-SV site-frequency data for appropriately consented sam- 
ples (n =10,847) have been distributed in VCF and BED format via the 
gnomAD browser (https://gnomad.broadinstitute.org/downloads/), 
as well as from NCBI dbVar under accession nstd166. Furthermore, 
these SVs have been integrated directly into the gnomAD browser®. The 
architecture of the gnomAD browser is described in the main gnomAD 
study‘, as well as instructions for how to access and query the data 
hosted therein. 


Code availability 


The gnomAD-SV discovery pipeline is publicly available via a series 
of methods configured for the FireCloud/Terra platform (https:// 
portal.firecloud.org/#methods) under the methods namespace 
‘Talkowski-SV’. The svtk software package used extensively in the 
gnomAD-SV discovery pipeline is publicly available via GitHub (https:// 
github.com/talkowski-lab/svtk). Most custom scripts used in the pro- 
duction and/or analysis of the gnomAD-SV dataset are publicly available 
via GitHub (https://github.com/talkowski-lab/gnomad-sv-pipeline). All 
codeis made available under the MIT license, unless stated otherwise. 
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Extended Data Fig. 1| Detection of chromosome:scale dosage alterations. estimates in1-Mb bins for each rearranged sample inred or blue. Dark, medium 
Weestimated ploidy (that is, whole-chromosome copy number) for all 24 and light-grey background shading indicates the range of copy number 
chromosomes per sample. a, Distribution of autosome ploidy estimatesacross —_ estimates for 90%, 99% and 99.9% of all gnomAD-SV samples, respectively, and 
14,378 samples passing initial data quality thresholds. White diamonds the medium grey line indicates the median copy number estimate across all 
indicate medians. Individual points are outlier samples at least three standard samples. Regions of unalignable N-masked bases >1 Mb in the reference 
deviations away from the cohort-wide mean. The outlier points marked inred genome are masked with grey rectangles. f, Sex chromosome ploidy estimates 
and blue correspond tothe samples highlighted in b-e. b-e, Samples with for allsamples froma. We inferred karyotypic sex by clustering samples to their 
outlier autosome ploidy estimates typically contained somatic or mosaic nearest integer ploidy for sex chromosomes. Several abnormal sex 
chromosomal abnormalities, such as somatic aneuploidy of chromosome1 chromosome ploidies are marked, including XYY (i), XXY (ii), XXX (iii), and 
(chr) (b) or chromosome 8 (e), or large focal somatic or mosaic CNVs on mosaic loss-of-Y (iv). g, Histogram representation of the data fromf. Essentially 


chromosome 3 (c) and chromosome 7 (d). Each panel depicts copy-number allsamples conformed to canonical sex chromosome ploidies. 
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Extended Data Fig. 2 | Benchmarking the technical qualities of the 
gnomAD-SVcallset. We evaluated the quality of gnomAD-SV with seven 
orthogonal analyses detailed in Supplementary Table 4, Supplementary 

Figs. 6-9 and Supplementary Note 1. Four core analyses are presented here. 

a, Apparent rates of de novo (that is, spontaneous) heterozygous SVs per child 
across 970 parent-child trios. Each point is a single trio, and vertical lines 
denote medians. Given the expected mutation rate of SVs accessible to 
short-read WGS!”” (<1 true de novo SV per trio; see also Fig. 3a), effectively all 
de novo SVs represented a combination of false-positive genotypes in children 
and/or false-negative genotypes in parents. SVs passing all filters and included 
in the final gnomAD-SV callset (‘pass’) are shown in green. For comparison, 
variants that did not pass post hoc site-level filters (‘not pass’) are also shownin 
purple. b, Hardy-Weinberg equilibrium (HWE) metrics for all biallelic SVs 
localized to autosomes. Deviation from HWE was assessed using a chi-square 
goodness-of-fit test with one degree of freedom. Vertex labels reflect 
genotypes: 0/0 denotes homozygous reference; 0/1 denotes heterozygous; 
and 1/1 denotes homozygous alternate, with all sites shaded by chi-squared 
Pvalue.c, Linkage disequilibrium between SVs and SNVs or indels for 23,706 
common (allele frequency >1%) SVs represented as cross-population maximum 
R* values after excluding repetitive and low-complexity regions (see 
Supplementary Fig. 7). Points and vertical bars represent medians and 
interquartile ranges, respectively. d, Correlation of allele frequency (AF) for 
37,907 common SVs captured by both the 1000 Genomes Project and 
gnomAD-SV'. Pearson’s correlation coefficient (R2) is provided. 
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Extended Data Fig. 3 | Insilico confirmation of SVs in gnomAD-SV with 
long-read WGS. We used Pacific Biosciences (PacBio) long-read WGS data 
available for four samples in this study to perform in silico confirmation to 
estimate the positive predictive value and breakpoint accuracy for SVsin 
gnomAD-SV”"*>*6 (Supplementary Fig. 10). a, Counts of SVs evaluated per 
sample in this analysis. SVs were restricted to those with breakpoint-level read 
support (that is, ‘split-read’ evidence, 92.8% of all SVs) and did not have 
breakpoints localized to annotated simple repeats or segmental duplications. 
b, Aniterative local long-read WGS realignment algorithm, VaPoR”, was used to 
perform in silico confirmation of SVs predicted from short-read WGS in 
gnomAD-SV. As noted by the VaPoR developers”, the performance of this 
approach was sensitive to the sequencing depth of long-read WGS data. 
Therefore, the weighted mean of the four samples was used as a study-wide 
long-read WGS confirmation rate, weighting the confirmation rate ofeach 
sample based on the square root ofits long-read WGS sequencing depth. 

c, Confirmation rates stratified by SV class, size and allele frequency. A mean of 
4,829 SVs per sample were assessed. Horizontal green bars denote weighted 
means. 
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Extended Data Fig. 4| SVs contribute a substantial burden of rare, to predicted functional consequence. h, Counts of pLoF SVs per genome. For 
homozygous, and coding mutations per genome. a—d, Counts of SVs per certain categories, suchas genes disrupted by rare SVs per genome, a subset of 
genome across a variety of parameters, corresponding to median counts of samples (<5%) were enriched above the population average, as expected for 
total SVs (a), homozygous SVs (b), rare SVs (c) and singleton SVs (d). Samples individuals carrying large, rare CNVs predicted to cause the disruption of 


are grouped by population and coloured by SV types. Thesolidbartotheleftof | dozens or hundreds of genes (see Extended Data Fig. 1); for the purposes of 
each population indicates the population median.e-g, Mediancountsofgenes __ visualization, the yaxis for all panels has been restricted toa maximum of three 
disrupted by SVs per genome when considering all SVs (including MCNVs) (e), interquartile ranges above the third quartile across all samples for each 
homozygous SVs (including MCNVs) (f), and rare SVs (g). Colours correspond category. 
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Extended Data Fig. 5| Rearrangement size is a primary determinant of 
allele frequency for most classes of SVs. a, Proportion of singleton SVs in five 
SV size bins for each class of biallelic SVs considered in this study. Intergenic 
SVs (light colours; n = 206,954) exhibited reduced singleton proportions when 
compared to all SVs (dark colours; n =335,470) of the same size and class. Bars 
reflect 95% confidence intervals from 100-fold bootstrapping. Categories with 
fewer than ten SVs are not shown. b, To account for the strong dependency of 
singleton proportion on SV size and class, we developed the APS metric, which 
normalizes singleton proportions using SV-specific technical and genomic 
covariates to permit comparisons of the frequency spectra across SV classes 


(see Supplementary Fig. 14). The same data as ina are shown, transformed onto 
the APS scale, which shows effectively no dependency on SVsize for intergenic 
SVs. Bars reflect 95% confidence intervals from 100-fold bootstrapping. 
Residual deviation from APS = O is maintained when considering all SVs, owing 
to APS being intentionally calibrated to intergenic SVs as a proxy for neutral 
variation. Because larger SVs are more likely to be gene-disruptive, they 
upwardly bias the APS point estimates due to residual negative selection not 
captured by SVsize alone. Counts of SVs per category for bothaand bare listed 
in Supplementary Table 9. 
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Extended Data Fig. 6 | Most SVs within genes appear under negative 
selection. a, Enrichments for pLloF consequences among rare and singleton 
SVs across SV classes. b, Enrichments for non-pLoF functional consequences 
among rare and singleton SVs across SV classes. c, Adjusted proportion of 
singletons across SV types and functional consequences. d, APS among 
deletions relative to count of exons and whole genes deleted. e, Fractions of all 
autosomal protein-coding genes with at least one SV across a variety of 
functional consequences. f, Relationship of APS and constraint against pLoF 


UF) Sextile 


SNVs*. For this analysis, intronic, promoter and UTR SVs were required to have 
precise breakpoints (that is, have ‘split-read’ support) to protect against any 
cryptic overlap with coding sequence unable to be annotated due to imprecise 
breakpoints. For c, dand f, points and vertical bars represent 95% confidence 
intervals from100-fold bootstrapping, respectively. Counts of SVs per 
category incand dare providedin Supplementary Table 9. Ford andf, 
deletions in highly repetitive or low-complexity sequence (>30% coverage by 
annotated segmental duplications or simple repeats) were excluded. 
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Extended Data Fig. 7 | gnomAD-SV can augment disease association 
studies. a, Functional enrichments of 2,307 common SVs in strong linkage 
disequilibrium (R?> 0.8) with an SNV associated with a trait or disease in the 
GWAS catalogue or the UK Biobank*?*. Points represent odds ratios of SVs 
being in strong linkage disequilibrium with at least one GWAS-significant SNV 
amongall SVs in strong linkage disequilibrium with at least one SNV (total 
n=15,634 SVs). Single and triple asterisks correspond to nominal (P< 0.05) and 
Bonferroni-corrected (P< 0.0083) significance thresholds froma two-sided 
Fisher’s exact test, respectively. Bars represent 95% confidence intervals. Test 
statistics, SV counts, and Pvalues are provided in Supplementary Table 6. 

b, Example locus at 16q22.1, where we identified a 336-bp deletion in strong 
linkage disequilibrium with SNVs significantly associated with hypothyroidism 
in the UK Biobank™. Top, the GWAS signal among genotyped SNVs in the UK 
Biobank, coloured by strength of linkage disequilibrium (Pearson’s R? value) 
with the 336-bp deletion identified in gnomAD-SV. Bottom, the local genomic 
context of this deletion, which overlaps an annotated intronic Alu element near 
(<1 kb) the first exon of a highly constrained, thyroid-expressed gene, 
ATP6VOD1. The deletion lies amidst histone mark peaks commonly found at 
active enhancers (H3K27ac and H3K4mel1) based on publicly available 
chromatin data from adult thyroid samples, a phenotype-relevant tissue*®. 
Human Alu elements are known to frequently act as enhancers, and the sentinel 
hypothyroidism SNV from the UK Biobank GWAS is a significant 
expression-modifying variant (that is, eQTL) for ATP6VOD1 and other nearby 
genes across many tissues, which indicates that the hypothyroidism risk 
haplotype modifies expression of ATP6VOD1 and/or other genes, potentially 
through the deletion of an intronic enhancer*”. 
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Extended Data Fig. 8 | Anextremely complex SV involving 49 breakpoints 
and seven chromosomes. A highly complex insertion rearrangement from 
gnomAD-SVin which 47 segments from six different chromosomes were 
duplicated and inserted into a single locus onchromosomel1, forming a 
626,065 bp stretch of contiguous inserted sequence composed of shattered 
fragments. Given the involvement of multiple chromosomes, the signature of 
localized shattering, and the clustered breakpoints, we note that this 
rearrangement has several hallmarks of germline chromothripsis, which has 
been observed in healthy adults previously, albeit rarely”. However, unlike 
previous reports of germline chromothripsis, there are no apparent 


whole-chromosome translocations, and all segments were duplicated before 
being inserted ina compound manner into chromosome 1, potentially 
suggesting areplication-based repair mechanism. The exact origin of this 
rearrangement is unclear. a, Circos representation of all 49 breakpoints and 
seven chromosomes involved in this SV. Teal arrows indicate insertion point 
into chromosome 1. b, The median segment size was 8.4 kb. c, Linear 
representation of the rearranged inserted sequence. Colours correspond to 
chromosome of origin, and arrows indicate strandedness of the inserted 
sequence, relative to the GRCh37 reference. 
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The acceleration of DNA sequencing in samples from patients and population studies 
has resulted in extensive catalogues of human genetic variation, but the 
interpretation of rare genetic variants remains problematic. A notable example of this 
challenge is the existence of disruptive variants in dosage-sensitive disease genes, 
even in apparently healthy individuals. Here, by manual curation of putative 
loss-of-function (pLoF) variants in haploinsufficient disease genes in the Genome 
Aggregation Database (gnomAD)', we show that one explanation for this paradox 
involves alternative splicing of mRNA, which allows exons of a gene to be expressed at 
varying levels across different cell types. Currently, no existing annotation tool 
systematically incorporates information about exon expression into the 
interpretation of variants. We develop atranscript-level annotation metric known as 
the ‘proportion expressed across transcripts’, which quantifies isoform expression for 
variants. We calculate this metric using 11,706 tissue samples from the Genotype 
Tissue Expression (GTEx) project” and show that it can differentiate between weakly 
and highly evolutionarily conserved exons, a proxy for functional importance. We 
demonstrate that expression-based annotation selectively filters 22.8% of falsely 
annotated pLoF variants found in haploinsufficient disease genes in gnomAD, while 
removing less than 4% of high-confidence pathogenic variants in the same genes. 
Finally, we apply our expression filter to the analysis of de novo variants in patients 
with autism spectrum disorder and intellectual disability or developmental disorders 
to show that pLoF variants in weakly expressed regions have similar effect sizes to 
those of synonymous variants, whereas pLoF variants in highly expressed exons are 
most strongly enriched among cases. Our annotation is fast, flexible and 
generalizable, making it possible for any variant file to be annotated with any isoform 
expression dataset, and will be valuable for the genetic diagnosis of rare diseases, the 
analysis of rare variant burden in complex disorders, and the curation and 
prioritization of variants in recall-by-genotype studies. 


A primary challenge in the use of genome and exome sequencing to 
predict human phenotypes is that our capacity to identify genetic 
variation exceeds our ability to interpret their functional impact**. 
One underappreciated source of variability for variant interpretation 
involves differences in alternative mRNA splicing, which enables exons 
to be expressed at different levels across tissues. These expression 
differences mean that variants in different regions of a gene can have 


different phenotypic outcomes depending onthe isoforms they affect. 
For example, variants that occur in an exon differentially included in 
two isoforms of CACNAIC with diverse patterns of tissue expression 
result in distinct types of Timothy syndrome’. Pathogenic variants in the 
isoform that exhibits multi-tissue expression result in a multi-system 
disorder*’, whereas those on the isoform predominantly expressed in 
the heart result in more severe and specific cardiac defects®. In addition, 
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Fig. 1| Curation of pLoF variants in haploinsufficient disease genes found 
in gnomAD reveals transcript errors as a major confounding error modein 
variant annotation. We identified and manually curated 401 pLoF variants in 
the gnomAD dataset in 61 haploinsufficient severe developmental delay genes 
and flagged any reason the pLoF may not beatrue LoF variant. Top, the 
frequency of each error mode present in the 306 variants classified as unlikely 


Mendelian disease variants have been found on tissue-specific iso- 
forms*”° and isoform expression levels in 7TN have been used to show 
that pLoF variants found in healthy controls occur in exons that are 
absent from dominantly expressed isoforms, whereas those in patients 
with dilated cardiomyopathy occur on constitutive exons", emphasiz- 
ing the utility of exon expression information for variant interpretation. 


Isoform diversity and variant interpretation 


We find that isoform diversity is a contributor to the paradoxical find- 
ing of disruptive variants in dosage-sensitive disease genes in osten- 
sibly healthy individuals. In the gnomAD database, we identify 401 
high-quality pLoF variants that pass both sequencing and annotation 
quality filters in 61 haploinsufficient disease genes in which heterozy- 
gous pLoF variants are established to cause severe developmental 
delay phenotypes with high penetrance (Methods). Given the severity 
of these phenotypes and their extremely low prevalence worldwide, 
ranging from1in10,000 to less than 1ina million, very few, ifany true 
pLoF variants would be expected to be found in the gnomAD popula- 
tion. As such, most or all of these observed pLoF variants are likely to 
be sequencing or annotation errors”. Manual curation of these variants 
reveals common error modes that result in probable misannotation of 
pLoFs, with diversity of transcript structure, mediated by variants fall- 
ing onlow-confidence transcripts, emerging as a major consideration 
(Fig. 1, Supplementary Fig. 1, Supplementary Tables 1-3). However, no 
existing tools systematically incorporate information on transcript 
expression into variant interpretation. 


pext score summarizes isoform expression 


The advent of large-scale transcriptome sequencing datasets, such 
as GTEx’, provides an opportunity to incorporate cross-tissue exon 
expression into variant interpretation. However, the current formats 
of these databases do not readily allow for unbiased estimation of exon 
expression. The GTEx web browser offers information on exon-level 
read pileup across tissues, but this approach is confounded by technical 
artefacts such as 3’ bias” (preferential coverage of bases close to the 3’ 
end of atranscript) (Supplementary Fig. 2a). Such systematic biases 
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to beatrue LoF. Transcript errors emerge as a major putative error mode inthe 
annotation of these pLoF variants. Bottom, bee swarm plot shows the average 
pext score across GTEx tissues for each variant in the error categories. This 
shows that pext values are discriminately lower for variants that are annotated 
as possible transcript errors (P=4.1x 10°, two-sided Wilcoxon test between 
transcript errors and other error modes). 


mean that simple exon-level coverage ina transcriptome dataset can- 
not be used asa reliable proxy for exon expression, especially in longer 
genes (Fig. 2a, Supplementary Fig. 2b). 

Isoform quantification tools provide estimates of isoform expression 
levels that correct, albeit imperfectly”, for confounding by 3’ bias as 
wellas other technical artefacts such as isoform length, isoform GC con- 
tent, and transcript sequence complexity”. Here, we use isoform-level 
quantifications from 11,706 tissue samples from the GTEx v7 dataset to 
derive an annotation-specific expression metric. For each tissue, we 
annotate each variant with the expression of every possible consequence 
acrossalltranscripts, which can be used to summarize expression in any 
combination of tissues of interest. We first compute the median expres- 
sion of atranscript across tissue samples, and define the expression of 
a given variant as the sum of the expression of all transcripts for which 
the variant has the same annotation (Fig. 2a, Supplementary Fig. 3a). By 
normalizing the expression of the annotation to the total gene expres- 
sion, we define a metric (proportion expression across transcripts, or 
‘pext’), which can be interpreted as a measure of the proportion of the 
total transcriptional output froma gene that would be affected by the 
variant annotation in question (Supplementary Fig. 3b). 

The pext metric allows for quick visualization of the expression of 
exons across a gene. In Fig. 2b, transcript-expression based annotation 
is shown for TCF4, a haploinsufficient gene in which heterozygous 
variants result in Pitt-Hopkins syndrome’, a highly penetrant disor- 
der associated with severe developmental delay. This gene contains 
20 unique high-quality pLoF mutations across 56 individuals in the 
gnomAD database. All 20 variants lie on exons with no evidence of 
expression across the GTEx dataset (Fig. 2b, Supplementary Fig. 4), 
which indicates that functional TCF4 protein can be made inthe pres- 
ence of these variants. This visualization is now available for all genes 
inthe gnomAD browser (https://gnomad.broadinstitute.org), and can 
aid in the rapid identification of variants occurring on exons with little 
to no evidence of expression in GTEx. 


Functional validation of pext 


To explore whether expression-based annotation marks functionally 
important regions, we compared the distribution of the pext metric in 


Nature | Vol 581 | 28 May 2020 | 453 


Article 


a 3,000 Brain cortex Raw reads from RNA-seq 
3 exhibit 3’ bias 
© 4,500 We obtain isoform 
gs ff eal A gq quantifications from the 
E | a GTEx v7 dataset 
3; ,| =a 
Calculate median 
5 3’ transcript expression 
Genome 1 2 3 4 5 6 7 8 9 10 across samples per tissue 
Cortex Liver 
Transcript A 25 10 
Transcript B 0.1 10 
Transcript C 10 0.5 


354 Brain cortex 


207 Liver 


Transcript-level expression 


b wi wtf WY 
1.0 i oii iti oe ii 
1 ottm toot | i 

0.8 ett tt tI 
ott tn tt tt 

Mout Et i 

0.6 ett | 1 
Hott tn tT tt 

0.4 m ttm tt tt 1 
aot Et i 

1 ottm bt tt 

0.2 ett to 1 
mo otlm mf | 1 

ee eh tl 

0 A alt al 3 a 


Define expression of 


a position as the sum 
of transcripts per annotation 
17.5 
Resulting transcript-based 
SS annotations: 
(0) 


¢ Do not exhibit 3’ bias 


¢ Reveal exon usage differences 
between tissues (exon 8) 
10 
¢ Reveal base-level expression 
differences between tissues (exon 3) 
3 
+4 
i 
i 
1 
i 
' 
i 
1 
i 
' 
i 
' 
i 
' 
@. 


Fig. 2| Summary of transcript-expression based annotation method. 

a, Overview of transcript-aware annotation. Most genes have many annotated 
isoforms, which can have varying expression patterns across tissues. Using the 
number of reads aligning to exonic regions in transcriptome datasets asa 
proxy for exon expression (top, black) has confounding effects, due to 3’ bias. 
In this example, although exons 3 and 8 have markedly different expression 
levels in brain cortex, the number of reads aligning to the two exons is similar, 
and this masks the differences in exon usage. Transcript-aware annotation 


evolutionarily conserved and unconserved regions using phyloCSF”. 
Exons with patterns of multi-species conservation consistent with 
coding regions have higher phyloCSF scores, and should exhibit detect- 
able expression patterns, whereas regions with lower scores will be 
enriched for incorrect exon annotations, which are expected to have 
little evidence of expression in a population transcriptome dataset. 
As expected, we observe significantly lower expression for uncon- 
served regions, and near-constitutive expression in highly conserved 
regions (Fig. 3a, Supplementary Fig. 5a). This difference remains statis- 
tically significant after correcting for exon length (logistic regression 
P<1.0 x10), which can influence both phyloCSF scores and isoform 
quantifications, indicating that transcript expression-aware annotation 
marks functionally relevant exonic regions. 

Although the metrics are associated, we find that pext provides 
orthogonal information to conservation for variant interpretation. 
For example, regions with low evidence of conservation but high 
expression (Fig. 3a) are enriched for genes in immune-related path- 
ways (Methods), which are selected for diversity but represent true 
coding regions. In addition, the pext value is higher for pLoF vari- 
ants annotated as high confidence by the loss-of-function transcript 
effect estimator (LOFTEE) package’, with no additional flags than 
those flagged as having found on unlikely open-reading frames or 
weakly conserved regions (Fig. 3b, Supplementary Fig. 5b). How- 
ever, high-confidence LOFTEE variants with no flags can also have 
low pext values, which suggests that transcript-expression-aware 
annotation adds additional information to the currently available 
interpretation toolkit. 
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defines the expression of every variant as the sum of transcripts that have the 
same annotation. The resulting transcript-level expression plots do not exhibit 
3’ bias, and reveal differences in exon usage, suchas those in exons 3 and 8, 
across tissues. b, Example of utility of transcript-expression based annotation. 
There are 20 high-quality pLoF variants in the haploinsufficient developmental 
delay gene 7CF4in gnomAD, annotated as dashed lines and arrows. All20 
variants have no evidence of expression in the GTEx dataset, which suggests 
that functional TCF4 protein can be made in the presence of these variants. 


We undertook manual evaluation of 128 regions marked as unex- 
pressed (mean pext < 0.1in all tissues and in GTEx brain) in 61 haploin- 
sufficient genes following the GENCODE manual annotation workflow” 
to evaluate the annotation quality in these coding sequence (CDS) 
regions. One-third of flagged regions were associated with low-quality 
models that have been removed or switched to non-coding biotypes 
in subsequent GENCODE releases (Supplementary Fig. 6), and 70% of 
the remaining regions correspond to models that satisfy only mini- 
mum criteria for inclusion in the gene set, corresponding to ‘putative’ 
annotations that lack markers for CDS functionality (Supplementary 
Table 4). Nonetheless, we find support for some highly conserved CDS 
regions, several of which show evidence of transcription in fetal tissues, 
underlining the importance of incorporating several isoform expres- 
sion datasets for interpretation (Supplementary Fig. 6d). 

Non-synonymous variants found on constitutively expressed regions 
would be expected to be more deleterious than those on regions with 
no evidence of expression. To test this, we defined expression bins 
based onthe average pext value across GTEx tissues, in which an aver- 
age pext value less than 0.1 was defined as low (or unexpressed), above 
0.9 as high (or near-constitutive) and intermediate values as medium 
expression. We compared the mutability-adjusted proportion sin- 
gleton (MAPS), a measure of negative selection on variant classes”, 
partitioned on the loss-of-function observed/expected upper-bound 
fraction (LOEUF) decile, ameasure of constraint against pLoF variants in 
the gnomAD dataset’ in each of these expression bins. MAPS scores dif- 
fered substantially between pLoF variants found on low-expressed and 
high-expressed regions in genes intolerant to pLoF variation (Fig. 3c, 
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Fig. 3 | Functional validation of transcript-expression based annotation. 
a, We define highly conserved and unconserved regions as phyloCSF >1,000 
(n=9,817) and phyloCSF <-100 (n=11,860), respectively, and compare the 
expression status of these regions across GTEx. Regions with high phyloCSF 
scores are enriched for near-constitutive expression, whereas unconserved 
regions are enriched for little tono usage across GTEx. This difference is 
significant after correcting for gene length (logistic regression P<1x10"). 
We note that unconserved regions with high levels of expression (pext > 0.9) 
are enriched for immune-related genes, which are selected for diversity 

and thus have low conservation, but represent true coding regions. 

b, Transcript-expression based annotation recapitulates, and adds information 
to, existing interpretation tools. High-confidence pLoF LOFTEE variants in 
gnomAD with no flags (n= 458,880) are enriched for higher pext values, 
whereas high-confidence pLoF variants falling on low phyloCSF (n= 44,373) or 


Supplementary Fig. 5c, Supplementary Table 5a, b). This informa- 
tion is complementary to existing variant prioritization tools such as 
PolyPhen-2” (Supplementary Fig. 5d, Supplementary Table 5c). This 
skew of non-synonymous variation in high-expressed regions suggests 
that variation arising in such exons tends be more deleterious, whereas 
non-synonymous variants on regions with low expression are similar 
to missense variants in their inferred deleteriousness. 


Use of pext in variant interpretation 


To evaluate the utility of transcript expression-based annotation in 
Mendelian variant interpretation, we assessed the number of variants 
that would be filtered based ona pext cut-off value of less than 0.1 (low 
expression) across GTEx tissues for three gene sets. First, we evaluated 
high-quality pLoF variants in the 61 manually curated haploinsufficient 
genes in gnomAD and ClinVar”’. The low pext expression bin resulted in 
filtering of 22.8% of pLoF variants in haploinsufficient developmental 
delay genes in gnomAD, but only 3.8% of high-quality pathogenic vari- 
ants in ClinVar (P= 4.7 x 10°) (Fig. 4a, Methods). We next compared 
pLoF variants in autosomal recessive disease genes found inahomozy- 
gous state in at least one individual ingnomAD and any pLoF variant in 
these genes in ClinVar and observed similar results: expression-based 
annotation filters 30.0% of variants in gnomAD while only filtering 3.2% 
of variants in ClinVar (Fig. 4b) (P=3.5 x 10%). 

Finally, we evaluated gnomAD pLoF variants in genes that are con- 
strained against pLoF variation’ (LOEUF score < 0.35). Given that 
these genes are depleted for loss-of-function variation in the general 
population, we expect the observed pLoF variants in these genes to 


unlikely open-reading frame regions (n=2,437) are enriched for low 
expression. However, high-confidence pLoF variants can also have alow pext 
score. Variants flagged falling on regions that are unlikely open-reading frame 
or have weak conservation are enriched for lower pext values. Red dots denote 
the median pext value across GTEx, c, Non-synonymous variants found on 
near-constitutive regions tend to be more deleterious. We compared the MAPS 
score for variants with low (<0.1), medium (0.1< pext < 0.9) and high (pext > 0.9) 
expression. Variants with near-constitutive expression have a higher MAPS 
score, which indicates higher deleteriousness than those with little to no 
evidence of expression. Points represent MAPS values and error bars denote 
the 95% confidence interval. Dashed grey and orange lines represent MAPS 
values for all gnomAD missense and synonymous variants, respectively. The 
number of variants evaluated per category and unadjusted proportion 
singleton values can be found in Supplementary Table 5a. 


be enriched for annotation errors. We compared the proportion fil- 
tered to synonymous variants in the same genes, which we expect to 
be randomly distributed. Our metric removes 16.8% of pLoF variants 
in constrained genes, but only 5.2% of synonymous variants (Fig. 4c) 
(P<1.0 x10™°). In all cases, the vast majority of filtered variants were 
otherwise high-confidence with no LOFTEE annotation flags, which 
suggests again that pext provided additional information to existing 
variant prioritization tools in removing annotation errors (Supple- 
mentary Fig. 7). 


Use of pext in burden testing 


To explore the benefits of this approach for rare variant analysis, we 
applied pext binning to burden testing of de novo variants in patients 
with developmental delay/intellectual disability (DD/ID) or autism spec- 
trum disorder (ASD) using a set of 23,970 de novo variants collated from 
several studies including the Deciphering Developmental Disorders 
(DDD) project and the Autism Sequencing Consortium (ASC)**”*. We 
find that de novo pLoF variants in patients with DD/ID in low-expressed 
regions have similar effect sizes to those of synonymous variants (rate 
ratio of low-expressed pLoFs =1.08, P=0.90), whereas pLoF variants in 
highly expressed regions have much larger effect sizes (rate ratio= 4.64, 
P=3.74 x 10°) (Fig. 5a). This observation is consistent for de novo 
variants in autism (rate ratio for low-expressed pLoFs = 0.80, P=0.47; 
rate ratio for high-expressed pLoFs = 2.11, P= 8.2 x 10-8) (Fig. 5b) and 
congenital heart disease with co-morbid neurodevelopmental delay 
(Supplementary Fig. 8a) as well as rare variants (allele count < 10) identi- 
fied in highly constrained genes inthe large iPSYCH case-control study 
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Fig. 4| Transcript-expression based annotation aids Mendelian variant 
interpretation. a, Comparison of the proportion of high-quality pLoF variants 
filtered ina curated list of 61 haploinsufficient developmental delay genes in 
gnomAD versus ClinVar with a cut-off value of average pext across GTEx < 0.1 
(low expression). Expression-based filtering results in removal of 22.8% of 
gnomAD pLoFs and 3.8% of confidently curated set of pLoFs in ClinVar. 

b, Expression-based annotation filters 30% of pLoF variants found ingnomAD 


of Danish patients with autism spectrum disorder and attention-deficit/ 
hyperactivity disorder (Supplementary Fig. 8b). Overall, we consist- 
ently observe low-expressed pLoFs to have effect sizes similar to those 
of synonymous variants, with pLoF variants in constitutive regions 
having larger effect sizes, which suggests that incorporating transcript 
expression-aware annotation in rare variant studies can boost power 
for gene discovery. 


Discussion 

We have described the development and validation of a transcript 
expression-based annotation framework to integrate results from 
transcriptome sequencing experiments into clinical variant interpre- 
tation. Although our initial analysis uses GTEx, our method can be 
used with any isoform expression dataset to annotate any variant file 
rapidly in the scalable software framework Hail (https://hail.is). For 
example, annotation of more than 120,000 gnomAD individuals with 
GTEx takes under an hour using 60 cores, at a cost of about US$5 on 
public cloud compute, which can be further scaled to larger datasets. 
In addition, the annotations we provide are flexible: although we have 
described the use of average transcript-level expression across many 
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Fig. 5| Application of transcript-expression based annotation to de novo 
variant analyses in ASD and DD/ID. a, b, Transcript-expression-based 
analyses in patients with DD/ID (a) or ASD (b). We find that de novo pLoF 
variants found on near-constitutively expressed regions in GTEx brain tissues 
have larger effect sizes than de novo LoF variants in weakly expressed regions 
in both disorders. Notably, de novo pLoF variants found on regions with little 


evidence for expression are as equally distributed in cases versus controls as 
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inahomozygousstate in at least one individual, and 3.2% of any pLoF variants 
found inthe same genes in ClinVar. c, We extended this filtering approach to 
pLoF and synonymous variants in gnomAD pLoF-intolerant genes (defined by 
LOEUF < 0.35). This filters 16.8% of LoF and 5.2% of synonymous variants. The 
total number of high-quality variants considered in each group is shown. For all 
pLoFs only high-confidence LOFTEE variants were considered. Pvalues were 
determined by two-sided Fisher’s exact test for counts. 


tissues, alternative approaches such as using maximum expression 
across any tissue may prove useful depending on variant interpretation 
goals (Supplementary Figs. 9,10). 

We note that although this metric successfully discriminates between 
near-constitutive and low expression levels, which are useful for pri- 
oritizing and filtering variants, respectively, regions with interme- 
diate expression levels are more challenging to interpret. However, 
we hypothesize directed analyses of intermediate expression levels 
may help to determine the role of alternative splicing in phenotypic 
diversity*°. In addition, although we have binned average pext scores 
across GTEx tissues into low, medium and high expression, different 
genes will probably have varying optimal tissues and thresholds for 
variant interpretation. Regions tagged as low expression are often cor- 
roborated by expert opinion of CDS curation, but domain knowledge 
of a gene will outperform this summary metric. 

An important caveat in our approach is the imprecision of isoform 
quantification methods using short-read transcriptome data. However, 
we note that repeating key analyses in the manuscript witha different 
isoform quantification tool showed consistent results (Methods, Sup- 
plementary Fig. 11, Supplementary Table 6), suggesting robustness to 
the precise pipeline used. The utility of this framework will increase 
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de novo synonymous variants, which suggests that such variants can be 
removed from analyses of gene burden testing to boost discovery power. 

The high pext expression bin contains 46.1%, 42.3% and 11.4%, and the 
low-expression bin contains 4.0%, 6.0% and 11.4% of 1,249, 752 and 166 de novo 
pLoF variants found in patients with DD/ID, ASD and controls, respectively. 
Points represent rate ratio estimate and error bars represent 95% confidence 


interval from the Poisson exact test. 


as our ability to quantify isoform expression across tissues improves, 
including refinement of methods and gene models, as well as availability 
of long-read RNA-sequencing data from human tissues. In addition, the 
improvement of single-cell RNA-seq technologies and the generation 
of data across human tissues will provide insight into cell type-specific 
exon usage for incorporation into variant interpretation”. 

The code used to generate pext is available as open source software 
(https://github.com/macarthur-lab/tx_annotation). In addition, we 
provide a precomputed file of the transcript expression value for every 
possible single nucleotide variant in the human genome. This metric 
has already proven useful in variant curation for the identification of 
drug targets” and for filtering variants for the identification of human 
knockouts!. Overall, our metric can be incorporated into variant inter- 
pretation in Mendelian disease pipelines, analyses of rare variant bur- 
den, and the prioritization of variants for recall-by-genotype studies. 
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Methods 


Data reporting 

No statistical methods were used to predetermine sample size. The 
experiments were not randomized, and investigators were not blinded 
to allocation during experiments and outcome assessment. 


Curation of pLoF variants in haploinsufficient developmental 
disease genes 

To identify haploinsufficient developmental delay genes, we selected 
genes curated by the ClinGen Dosage Sensitivity Working Group** 
58 of the 61 genes had ascore of 3 with sufficient evidence for patho- 
genicity, whereas two genes (CHAMPI, CTCF) had ascore of 2 (some 
evidence) and one gene (RERE) was not yet scored. The penetrance of 
pathogenic variants in each gene was reviewed in the literature, and 
only genes with more than 75% reported penetrance were included. 
These conditions are those too severe to expect to see an individual 
in gnomAD (probably unable to consent for a study without guardian- 
ship). The 61 genes include 50 autosomal genes of high severity and high 
penetrance and 11 genes on chromosome X in which the phenotype 
is expected to be severe or lethal in males and moderate to severe in 
females. The resulting gene list is available at gs://gnomad-public/ 
papers/2019-tx-annotation/data/gene_lists/HI_genes_100417.tsv. 

We extracted pLoF variants, defined as essential splice acceptor, 
essential splice donor, stop-gained, and frameshift variants, identified 
inthe 61 haploinsufficient disease genes from the gnomAD v2.1.1exome 
and genomesites tables, and considered only those pLoF variants that 
passed random forest filtering in the gnomAD dataset, and were anno- 
tated as high confidence by LOFTEE v1.0. Of 61 genes, 55 had at least one 
high-quality pLoF available in gnomAD. We performed manual curation 
of 401 pLoF variants using a web-based curation portal to identify any 
reason a pLoF may have been a variant calling or annotation error, and 
categorized the likelihood of each variant being a true LoF. 

Evidence for classifying an LoF variant as artefactual was categorized 
into the following groups: mapping error, strand bias, reference error, 
genotyping error, homopolymer sequence, in-frame multi-nucleotide 
variant or frame-restoring indel, essential splice site rescue, minority 
of transcripts, weak exon conservation, last exon, and other annota- 
tion error. All possible reasons also to reject a LoOF consequence were 
flagged, even when a single criterion would categorize the variant as not 
LoF. Variants were then categorized as LoF, probable LoF, probably not 
LoF, and not LoF based on criteria outlined in Supplementary Table 2. 
Supplementary Fig. 1a shows the distribution of the LoF verdicts for 
the 401 pLoF variants. 

Technical errors comprised genotyping errors, strand biases, refer- 
ence errors, and repetitive regions that could be detected by visual 
inspection of reads in the Integrative Genomics Viewer® (IGV) and 
from the UCSC genome browser’®’. Genotyping errors comprised 
skewed allele balances (conservative cutoff of < 35%), low complex- 
ity sequences, GC-rich regions, homopolymer tracts (>6 base pairs 
or > 6 trinucleotide repeats) and low quality metrics (genotype quality 
< 20). Strand bias was flagged when a variant was skewed preferen- 
tially on the forward or reverse strand, or when the majority (>90%) 
of a given strand covered a region; this was often observed around 
intron-exon boundaries. Strand biases despite balanced coverage 
of the forward and reverse strands were weighted towards prob- 
ably not LoF, whereas a strand bias due to skewed strand coverage 
was weighted alongside other genotyping errors. Reference errors 
were uncommon, but identified by a small deletion in a given exon, 
posing as a <5-base-pair intron. Most genotyping errors and strand 
biases in isolation were not deemed critical in deciding whether a 
variant was probably not LoF or not LoF, with the exception of allele 
balance <25%. Mapping errors were often identified by an enrichment 
of complex variation surrounding a variant of interest. Furthermore, 
the UCSC browser was used to highlight mapping discrepancies, such 


as self-chain alignments, segmental duplications, simple tandem 
repeats, and microsatellite regions. 

In-frame multi-nucleotide variants (MNVs), essential splice site 
rescue, and frame-restoring insertion-deletions are rescue events 
that are predicted to restore gene function. MNVs were visualized 
in IGV and cross checked with codons from the UCSC browser; in 
frame MNVs that rescued stop codons were scored as not LoF. Essen- 
tial splice site rescue occurs when an in frame alternative donor or 
acceptor site is present, which probably has a minimal effect on the 
transcript. A total of 36 base pairs upstream and downstream of the 
splice variant were assessed for splice site rescue. Cryptic splice sites 
within 6 base pairs of the splice variant were considered a complete 
rescue, rendering the variant not LoF. Rescue sites >6 base pairs 
away but within +20 base pairs were weighted with less confidence, 
scoring as probably not LoF. All potential splice site rescues were 
validated using Alamut v.2.11 (https://www.interactive-biosoftware. 
com/alamut-visual/). Frame-restoring indels were identified by 
scanning approximately +80 base pairs from the annotated indel 
and counting any insertions/deletions to assess if the frame would 
be restored. 

Transcript errors encompass issues surrounding alternative tran- 
scripts, variants within a terminal coding exon, poorly conserved 
exons, and re-initiation events. Coding variants that occupied the 
minority (<50%) of NCBI coding RefSeq transcripts for a given gene 
were considered not LoF. These variants often affected poorly con- 
served exons, as determined by PhyloP”’, PhyloCSF” and visualiza- 
tion in the UCSC browser**. The only exceptions to the minority of 
transcript criteria were cases where the exon was well conserved, 
which relegated the categorization to probably not LoF. Variants 
within the last coding exon, or within 50 base pairs of the penultimate 
coding exon were also considered not LoF, unless 25% <x < 50% of the 
coding sequence was affected, in which case the variant was deemed 
probably not LoF. If >50% of the coding sequence was disrupted by a 
variant in the last exon, this was deemed probably LoF. Other tran- 
script errors included: re-initiation errors; upstream stop codons of 
a given LoF variant; variants that fell on exactly 50% of coding RefSeq 
transcripts; and/or partial exon conservation. Re-initiation events 
were flagged when a methionine downstream of the variant in the 
first coding exon was predicted to restart transcription, and were pre- 
dicted to be probably not LoF. Variants occurring after astop codon 
in the last coding exon were considered not LoF, particularly across 
the region of the exon or transcript in question. Error categories 
were grouped for Fig. 1 as follows: Minority of transcripts and weak 
exon conservation were grouped as transcript errors, genotyping 
errors and homopolymers as sequencing errors, essential splice 
rescue and MNV grouped as rescue and strand bias was included in 
other annotation errors. 

The criteria above were strictly adhered throughout and manual cura- 
tion was performed by two independent reviewers to ensure maximum 
consistency and minimize human error. Any discordance in curation 
was re-curated by both curators together and resolved. Full results of 
manual curation are available in Supplementary Table 3. 


Calculation of transcript-expression aware annotation 

We first imported the GTEx v7 isoform quantifications into Hail and 
calculated the median expression of every transcript per tissue. This 
precomputed summary isoform expression matrix is available for GTEx 
v7 in gs://gnomad-public/papers/2019-tx-annotation/data/GRCH37_ 
hg19/. We alsoimport and annotate a variant file with the Variant Effect 
Predictor (VEP) version 85*° against Gencode v19”°, implemented in 
Hail with the LOFTEE v1.0 plugin. 

We use the transcript consequences VEP field to calculate the sum of 
isoform expression for variant annotations, that is, the annotation-level 
expression across transcripts (ext). For variants that have multiple con- 
sequences for one transcript (for example, a single nucleotide variant 


Article 


that is botha missense and a splice region variant on one transcript) we 
use the worst consequence, ordered by VEP (inthis example, missense 
takes precedence over splice region). We filter the consequences to 
those only occurring on protein coding transcripts. Full ordering of the 
VEP consequences is available at: useast.ensembl.org/info/genome/ 
variation/prediction/predicted_data.html 

We thensum the expression of every transcript per variant, for every 
combination of consequence, LOFTEE filter, and LOFTEE flag for every 
tissue (Supplementary Fig. 3a). For example, if a single nucleotide vari- 
ant is synonymous on ENSTI, a high-confidence LOFTEE stop-gained 
variant on ENST3 and ENST4, and low-confidence LOFTEE stop-gained 
variant on ENSTS and ENST6, the ext values will be synonymous: 
ENSTL, stop-gained high-confidence: ENST3 + ENST4, and stop-gained 
low-confidence: ENSTS + ENST6 per tissue. This can be computed with 
thetx_annotate() function by setting the tx_annotation_type to ‘expres- 
sion’. We foresee the non-normalized ext values to be useful when only 
considering one tissue of interest. 

To allow for taking average expression values across tissues of inter- 
est, we normalize the expression value for a given value to the total 
expression of the gene on which the variant is found. This is carried out 
by dividing the ext value with the sum of the expression of all transcripts 
per tissue in transcripts per million (TPM) (Supplementary Fig. 3b). 
The resulting pext value can be interpreted as the proportion of the 
total transcriptional output from a gene that would be affected by the 
given variant annotation in question. If the gene expression value (and 
thus the denominator) ina given tissue is O, the pext value will not be 
available (NA) for that tissue. 

When taking averages across tissues, such unavailable pext values 
are not considered (that is, when taking the mean across tissues, we 
remove NA values). This value can be computed with the tx_annotate() 
function by setting the tx_annotation_type to ‘proportion’. For the 
analyses in this manuscript, we remove reproduction-associated GTEx 
tissues (endocervix, ectocervix, fallopian tube, prostate, uterus, ovary, 
testes and vagina), cell lines (transformed fibroblasts and transformed 
lymphocytes) and any tissue with less than 100 samples (bladder, brain 
Cervicalc-1 spinal cord, brain substantia nigra, kidney cortex and minor 
salivary gland), resulting in the use of 38 GTEx tissues. 

We note that for a minority of genes, when RSEM® assigns higher 
relative expression to non-coding transcripts, the sum of the value 
of coding transcripts can be much smaller than the gene expression 
value for the transcript, resulting in low pext scores for all coding 
variants in the gene, and thus resulting in possible filtering of all 
variants for a given gene. In many cases this seems to be the result of 
spurious non-coding transcripts with a high degree of exon overlap 
with true coding transcripts. To prevent this artefact from affecting 
our analyses, we first calculated the maximum pext score for all vari- 
ants across all protein-coding genes, and removed any gene where 
the maximum pext score was below 0.2. This resulted in the filtering 
of 668 genes, representing 3.3% of all genes analysed. We note that 
there is no overlap with the 668 genes and the haploinsufficient gene 
list, 97 of the filtered genes are present in OMIM (representing 1.5% 
of the OMIM gene list) and 42 filtered genes are considered con- 
strained (representing 1.4% of LOEUF <0.35, or constrained, genes) 
thus having low effect on variant interpretation in the context of 
disease associations. 

The full transcript-expression aware annotation pipeline, 
implemented in Hail 0.2, is fully available at https://github.com/ 
macarthur-lab/tx_annotation with commands laid out for analyses in 
the manuscript. Passing a Hail table through the tx_annotate() func- 
tion returns the same table with a new field entitled ‘tx_annotation’ 
which provides either the ext or pext value per variant-annotation 
pair, depending on parameter choice. We provide a helper functionto 
extract the worst consequence and the associated expression values 
for these annotations. All analyses in the manuscript are based on the 
worst consequence of variant, ordered by VEP*®. 


Functional validation of transcript-expression aware 
annotation 

Conservation analysis was performed using phyloCSF scores using 
the same file used for the LOFTEE plugin, available publicly in gs:// 
gnomad-public/papers/2019-tx-annotation/data/other_data/phylocsf_ 
data.tsv.bgz.. We denoted exons with a phyloCSF max open-reading 
frame score >1,000 as highly conserved and those with phyloCSF max 
open-reading frame score <-100 as lowly conserved (Supplementary 
Fig. 5a) and evaluated their average usage in GTEx. 

Using the base-level pext values that are used inthe gnomAD browser, 
we filtered to intervals with high or low conservation, and calculated 
the average pext value in the interval. To evaluate regions with low 
conservation but high expression, we identified genes harbouring 
unconserved regions with the pext value >0.9 for pathway enrichment 
analysis and used the web browser for FUMA GENE2FUNC feature”, 
which incorporates Reactome”, KEGG", Gene Ontology” (GO) as well 
as other ontologies. Default parameters were used for FUMA, with all 
protein coding genes as the background list. Results from FUMA path- 
way analysis are available in Supplementary Fig. 12, and full results are 
available in Supplementary Table 7. 

Analysis of pext values for LOFTEE flags and the MAPS calculation 
were performed using the gnomAD v2.1.1 exome dataset. Calculation 
of MAPS scores was previously described” and is implemented as a Hail 
module, as also described previously’. MAPS is a relative metric, and 
cannot be compared across datasets, but is a useful summary metric 
for the frequency spectrum, indicating deleteriousness as inferred 
from rarity of variation (high values of MAPS correspond to lower fre- 
quency, suggesting the action of negative selection at more deleterious 
sites). The MAPS scores were calculated onthe gnomAD v.2.1.1 dataset 
partitioning upon the LOEUF score and expression bin. The script for 
generating MAPsscoresis available in the tx-annotation Github reposi- 
tory under /analyses/maps/maps_submit_per_class.py 


Manual evaluation of unexpressed regions in haploinsufficient 
developmental delay genes using the GENCODE workflow 

As an orthogonal evaluation of regions flagged as unexpressed with 
the pext metric, we identified any regionin 61 haploinsufficient disease 
genes witha mean pext value <O.1in all GTEx tissues and in GTEx brain 
samples, owing to the relevance of brain tissues for these disorders, 
regardless of mutational burden in gnomAD. The resulting list of 128 
regions was evaluated by the HAVANA manual annotation group of 
the GENCODE project”®. 

The manual evaluation first established whether the transcript model 
corresponding to the region in question was correct in terms of struc- 
ture, comparing exon-intron combinations, and the accuracy of splice 
sites against the RNA evidence supporting the model. Second, the func- 
tional biotype of each model was reassessed; in particular, whether the 
decision to annotate the model as protein-coding in GENCODE v19 was 
appropriate. Note that GENCODE models that incorporate alternative 
exons or exon combinations in comparison to the ‘canonical’ isoform 
are likely to be annotated as coding if they contain a prospective CDS 
that is considered biologically plausible, based ona mechanistic view 
of translation. These re-annotations are summarized in Supplementary 
Table 5. 

We binned cases into three main categories, according to confi- 
dence in both the accuracy and potential functional relevance of the 
overlapping models: (1) ‘error’, in which the model was seen to have 
an incorrect transcript structure and/or a CDS that conflicted with 
updated GENCODE annotation criteria (these annotations had been 
or will be changed in future GENCODE releases based on this evalu- 
ation); (2) ‘putative’, in which the model structure and CDS satisfied 
our current annotation criteria, although we judged the potential 
of the transcript represented to encode a protein with a functional 
role in cellular physiology to be nonetheless speculative (these have 


been maintained as putative protein-coding transcripts in GENCODE); 
(3) ‘validated’, in which we believe it is highly probable that the model 
represents a true protein-coding isoform. High confidence in the 
validity of the CDS was based on comparative annotation, that is, the 
observation of CDS conservation and also the existence of equivalent 
transcript models in other species. GENCODE also annotates transcript 
models as ‘nonsense-mediated decay’ and ‘non-stop decay’, in whicha 
translation is found that is predicted to direct the RNA molecule into 
cellular degradation programs. Although it has been established that 
such ‘non-productive’ transcription events can have a role in gene 
regulation and thus disease, the interpretation of variants within 
nonsense-mediated decay and non-stop decay CDS regions remains 
challenging. These models were therefore classed in a separate cat- 


egory. 


Gene list comparisons 
To evaluate the filtering power of the pext metric for Mendelian vari- 
ants, we evaluated the number of variants that would be filtered with 
an average GTEx pext cutoff of 0.1 (low expression) in the ClinVar and 
gnomAD datasets. We downloaded the ClinVar VCF from the ClinVar 
FTP (version dated 10/28/2018), imported it into Hail, annotated it with 
VEP v85 against Gencode v19, and added pext annotations with the 
tx_annotate() function. All evaluated variants were annotated as HC 
by LOFTEE v1.0, and ClinVar variants were filtered to those marked as 
pathogenic, with no conflicts, and reviewed with at least one star status. 
For variants in 61 haploinsufficient genes, we identified any variant 
identified in at least one individual with any zygosity in both datasets. 
For variants identified in autosomal recessive disease genes, we used a 
list of 1,183 OMIM disease genes deemed to follow a recessive inherit- 
ance pattern by Blekhmanetal.“ and Berg et al.’ (available as https:// 
github.com/macarthur-lab/gene lists/blob/master/lists/all_ar.tsv). We 
compared the pext value for all pLoF variants identified in ClinVar versus 
any variantinahomozygousstate in atleast oneindividualinthe gnomAD 
exome or genome datasets. Finally, we used a LOEUF cutoff of 0.35 to 
denote constrained genes, and compared any synonymous or pLoF 
variant in these genes in the gnomAD exome or genome datasets. 


De novo and rare variant analysis 

De novo variants were collated from previously published studies. We 
collected de novo variants identified in 5,305 probands from trio stud- 
ies of intellectual disability/developmental disorders (Hamdamet al.”’: 
n=Al, de Ligt et al.?°: V=100, Rauch et al.””: N=51, DDD: n = 4,293, 
Lelieveld et al.”°: n = 820), 1,073 probands with congenital heart dis- 
ease with co-morbid developmental delay (Sifrim et al.*°: n =512, Chih 
Jin et al.*”: 561), 6,430 ASD probands, and 2,179 unaffected controls 
from the Autism Sequencing Consortium”. We also used a previ- 
ously published dataset of variants in 8,437 cases with ASD and/or 
attention-deficit/hyperactivity disorder and 5,214 controls from the 
Danish Neonatal Screening Biobank*®. In this analysis, we analysed pLoF 
variants identified in highly constrained genes (first LOEUF decile) with 
acombined total allele count of < 10 in cases and controls. 

We annotated both de novo and rare variants with VEP v85 against 
Gencode v19 and added pext annotations with the tx_annotate() func- 
tion. We then calculated the average pext metric across 11 GTEx brain 
samples and binned them as low (pext < 0.1), medium (0.1< pext < 0.9) 
or high (pext > 0.9) expression. We then calculated the number of pLoF, 
missense, and synonymous variants per pext expression bin. To obtain 
case-control rate ratios and the 95% confidence intervals for de novo 
variant analyses, we used a two-sided Poisson exact test on counts. To 
obtain the odds ratio for the rare variant analysis in ASD/ADHD, we 
used the Fisher’s exact test for count data. 


Isoform quantifications via salmon 
To evaluate whether use of a different isoform quantification tool 
would affect results, we compared results of TCF4 base-level expression 


(shown in Fig. 2b), MAPS (Fig. 3c) and comparison of the number of 
variants filtered in haploinsufficient developmental disease genes in 
ClinVar vs gnomAD (Fig. 4a) using RSEM quantifications used in this 
study with quantifications using salmon v.0.12”. Due to the intracta- 
bility of re-quantifying the entire GTEx dataset, we downloaded and 
requantified 151 GTEx brain cortex CRAM files from the V7 dataset. 
We first converted CRAMs to fastq files using Picard 2.18.20 and ran 
salmon with the ‘salmon quant -i index -fastql — fastq2 -minAssigned- 
Fragl1 -validateMappings’ command. The index was created with the 
‘salmon index -t transcript.fa -type quasi —k 31’ command using the 
GENCODE v19 protein-coding and IncRNA transcripts FASTA files. The 
existing GTEx RSEM isoform quantifications were filtered to the same 
GTEx brain cortex samples. For the analyses to remain consistent with 
the remainder of the manuscript, we calculated the maximum brain 
cortex pext score for all variants across all protein-coding genes for 
both the RSEM and salmon quantifications, and removed any gene in 
whichthe maximum pext score was below 0.2. This resulted in filtering 
325 genes from the salmon quantification of the brain cortex samples 
and 691 genes fromthe RSEM quantification, corresponding to 3.4 and 
1.6% of quantified genes, respectively. We filtered these genes in both 
the MAPs and gene list comparison analysis seen in Supplementary 
Fig. 11. The WDL script for the quantification pipeline is available at: 
gs://gnomad-public/papers/2019-tx-annotation/results/salmon_rsem/ 
salmon.wdl and the commands to obtain results for each individual 
analysis in the tx-annotation Github repository under /analyses/rsem_ 
salmon/. 


Transcript expression aware annotation with fetal isoform 
expression dataset 

Although our analyses were based on transcript expression aware 
annotation from the GTEx v7 dataset, we provide necessary files for 
pext annotation with the Human Brain Development Resource (HBDR) 
fetal brain dataset’ in gs://gnomad-public/papers/2019-tx-annotation/ 
data/HBDR_fetal_RNaseq. HBDR includes 558 samples from varying 
brain subregions across developmental time points. We downloaded 
HDBR sample fastq files from European Nucleotide Archive (study 
accession PRJEB14594) and obtained RSEM isoform quantification 
on HBDR fastqs using the GTEx v7 quantification pipeline, publicly 
available at https://github.com/broadinstitute/gtex-pipeline/) which 
briefly involves two-pass alignment with STAR v2.4.2a™ and isoform 
quantification with RSEM v1.2.22. Here, we also removed genes where 
the average pext across HBDR was below 0.2, resulting in the removal 
of 712 genes (3.5% of all analysed genes). The dataset was also used for 
the analysis of baselevel expression values in SCN2A shown in Sup- 
plementary Fig. 7d. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 

We used the gnomAD v.2.1.1 sites Hail 0.2 (https://hail.is) table that is 
accessible publicly at gs://gnomad-public/release/2.1.1 and at https:// 
gnomad.broadinstitute.org. The GTEx v7 gene and isoform expression 
data were downloaded from the GTEx portal (gtexportal.org). The 
LOEUF constraint file was downloaded from gs://gnomad-resources/ 
lof_paper/. All files used in the analyses in the manuscript are available 
in gs://gnomad-public/papers/2019-tx-annotation/. 


Code availability 

The GTEx pipeline for isoform quantification is available publicly 
(https://github.com/broadinstitute/gtex-pipeline/) and briefly involves 
two-pass alignment with STAR v2.4.2a°, gene expression quantification 
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with RNA-SeQC v1.1.8", and isoform quantification with RSEM v1.2.22. 
Variants used in all gnomAD analyses in the manuscript passed random 
forest filtering, and all pLoF variants were annotated as high-confidence 
by LOFTEE v.1.0, which is described in an accompanying manuscript!. 
Scripts to quality control the gnomAD dataset are available at https:// 
github.com/macarthur-lab/gnomad_qc and the scripts to generate files 
for the analyses are available at https://github.com/macarthur-lab/ 
tx_annotation. 
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The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


— The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


A description of all covariates tested 


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) 
AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) 


| For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted 
Give P values as exact values whenever suitable. 


For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 


| For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 


Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


Our web collection on statistics for biologists contains articles on many of the points above. 


Software and code 


Policy information about availability of computer code 


Data collection No software was used for the collection of data, as this was an opportunistic study. 


Data analysis All code to perform quality control and data analysis is provided in the following Github repos: 
https://github.com/macarthur-lab/tx_annotation 
https://github.com/macarthur-lab/gnomad_qc 
https://github.com/macarthur-lab/gnomad_lof 

https://github.com/konradjk/loftee 

https://github.com/broadinstitute/gtex-pipeline/ 

Hail 0.2 is available at: https://hail.is/ 

RSEM v1.2.22 : https://deweylab.github.io/RSEM/ 

RNA-SeQC v1.1.8 : https://github.com/broadinstitute/rnaseqc 

STAR v2.4.2a : https://github.com/alexdobin/STAR 

Alamut v.2.11 https://www.interactive-biosoftware.com/alamut-visual/ 

Variant Effect Predictor (VEP) v.85 : https://uswest.ensembl.org/info/docs/tools/vep/index.html 
FUMA GENE2FUNC v1.3.5e : https://fuma.ctglab.nl/ 

R version 3.4.0 : https://cran.r-project.org/bin/macosx/ 
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- Accession codes, unique identifiers, or web links for publicly available datasets 
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- Adescription of any restrictions on data availability 


All datasets are described in the manuscript or Supplementary Information, including deposition of the full dataset at https://gnomad.broadinstitute.org. Data for 
specific analyses are available publicly at gs://gnomad-public/papers/2019-tx-annotation/ and the specific folders therein for analyses are referenced in the 
manuscript for ease of recreating analyses with the data provided. There are no restrictions on the aggregate data released. 
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Life sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Sample size This study provides a framework and tool to improve variant interpretation in datasets of any size, no matter how small or large. As a proof of 
principle, we use one of the largest datasets of human genetic variation, gnomAD and the largest functional genomics dataset GTEx. In other 
words, this study is opportunistic, and involves secondary use of available genome, exome and transcriptome data. No sample size was 
predetermined. 


Data exclusions Sample QC and variant QC for gnomAD are described extensively in the supplementary methods of the main manuscript. Notably, individuals 
with severe pediatric disease, and known first disease relatives of those with severe pediatric disease were excluded. For the analyses in the 
manuscript, we removed GTEx tissues with low sample numbers, reproductive tissue and non-tissues (ie. cell lines). For the purpose of our 
manuscript, we did not define pre-exclusion criteria for calculation of pext. However for the analyses in the manuscript, we defined pre- 
exclusion tissues : we removed reproduction-associated GTEx tissues (endocervix, ectocervix, fallopian tube, prostate, uterus, ovary, testes, 
vagina), cell lines (transformed fibroblasts, transformed lymphocytes) and any tissue with less than one hundred samples (bladder, brain 
Cervicalc-1 spinal cord, brain substantia nigra, kidney cortex, minor salivary gland) . This is explained in the Methods section of the manuscript. 


Replication We did not attempt to reproduce any findings in a separate but identical dataset, as no other data set of comparable size exists. However we 
replicate key findings with a seperate isoform quantification tool. We successfully replicate the MAPS results, shown in Supplementary Figure 
11. We also use an external fetal dataset to provide additional data. 


Randomization — As this was a population-based study, and not a case-control study, no randomization was performed. 


Blinding As this was a population-based study, and not a case-control study, blinding was not relevant. 


Reporting for specific materials, systems and methods 


We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, 
system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. 


Materials & experimental systems Methods 
n/a | Involved in the study n/a | Involved in the study 
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Human research participants 


Policy information about studies involving human research participants 


Population characteristics 


Recruitment 


Ethics oversight 


As an opportunistic collection of data, the participants in this study were not selected based on age, gender, or genotypic 
information. As described above, individuals with severe pediatric disease, and known first disease relatives of those with severe 
pediatric disease were excluded from gnomAD. As an opportunistic collection of data, the participants in gnomAD were not 
selected based on age, gender, or genotypic information. The populations are provided in Supplementary Table 7 of the 
accompanying Karczewski et al., and there are 64,754 females and 76,702 males. These data were obtained primarily from case- 
control studies of adult-onset common diseases, including cardiovascular disease, type 2 diabetes, and psychiatric disorders. 


GTEx v7 collection was similarly opportunistic and has been previously extensively published and reported on. Population 
characteristics of the data can be found in Reference 2 : GTEx Consortium et al., Genetic effects on gene expression across 
human tissues. Nature 550, 204 (2017). 


As this was an opportunistic secondary use study, we did not recruit any participants. 


This study was overseen by the Broad Institute’s Office of Research Subject Protection and the Partners Human Research 
Committee, and was given a determination of Not Human Subjects Research. 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 
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Evaluating drug targets through human 
loss-of-function genetic variation 
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Naturally occurring human genetic variants that are predicted to inactivate 


protein-coding genes provide an in vivo model of human gene inactivation that 
complements knockout studies in cells and model organisms. Here we report three 
key findings regarding the assessment of candidate drug targets using human 
loss-of-function variants. First, even essential genes, in which loss-of-function 
variants are not tolerated, can be highly successful as targets of inhibitory drugs. 
Second, in most genes, loss-of-function variants are sufficiently rare that 
genotype-based ascertainment of homozygous or compound heterozygous 
‘knockout’ humans will await sample sizes that are approximately 1,000 times those 
presently available, unless recruitment focuses on consanguineous individuals. 
Third, automated variant annotation and filtering are powerful, but manual curation 
remains crucial for removing artefacts, and is a prerequisite for recall-by-genotype 
efforts. Our results provide a roadmap for human knockout studies and should guide 
the interpretation of loss-of-function variants in drug development. 


Human genetics is an increasingly crucial source of evidence guiding 
the selection of new targets for drug discovery’. Most new clinical drug 
candidates eventually fail for lack of efficacy’, and although in vitro, 
cell culture and animal model systems can provide preclinical evidence 
that the compound engages its target, too often the target itselfis not 
causally related to human disease’. Candidates targeting genes with 
human genetic evidence for disease causality are more likely to reach 
approval*“, and identification of humans with loss-of-function (LoF) 
variants, particularly two-hit (homozygous or compound heterozy- 
gous) genotypes, has, for several genes, correctly predicted the safety 
and phenotypic effect of pharmacological inhibition’. Although these 
examples demonstrate the value of human genetics in drug develop- 
ment, important questions remain regarding strategies for identifying 
individuals with LoF variants in a gene of interest, interpretation of 
the frequency—or lack—of such individuals, and whether it is wise to 
pharmacologically target a gene in which LoF variants are associated 
with a deleterious phenotype. 

Public databases of human genetic variation have catalogued pre- 
dicted loss-of-function (pLoF) variants—nonsense, essential splice site, 
and frameshift variants expected to result in a non-functional allele. 
This presents an opportunity to study the effects of pLoF variation in 


genes of interest and to identify individuals with pLoF genotypes to 
understand gene function or disease biology, or to assess potential 
for therapeutic targeting. Although many variants initially annotated 
as pLoF do not, in fact, abolish gene function®, rigorous automated 
filtering can remove commonerror modes’. True LoF variants are gen- 
erally rare, and show important differences between outbred, bottle- 
necked® and consanguineous’ populations”. Counting the number 
of distinct pLoF variants in each gene ina population sample allows the 
quantification of gene essentiality in humans througha metric named 
‘constraint”° ’. Specifically, the rate at which de novo pLoF mutations 
arise in each gene is predicted on the basis of rates of DNA mutation”, 
and the ratio of the count of pLoF variants observed in a database to 
the number expected based on mutation rates—obs/exp, or constraint 
score—measures howstrongly purifying natural selection has removed 
such variants from the population. The annotation of pLoF variants 
remains imperfect, and continued improvements are being made", 
but constraint usefully measures gene essentiality, as demonstrated 
by agreement with cell culture and mouse knockout experiments’, by 
overlap with human disease genes””’ and genes depleted for structural 
variation’, and by the power of constraint to enrich for deleterious 
variants in neurodevelopmental disorders””®. 


"Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA. ?Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, 
Cambridge, MA, USA. °Chemical Biology and Therapeutics Science Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA. “Analytical and Translational Genetics Unit, 
Massachusetts General Hospital, Boston, MA, USA. °Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA, USA. °Henry and Allison McCance Center for Brain 
Health, Massachusetts General Hospital, Boston, MA, USA. Department of Neurology, Massachusetts General Hospital, Boston, MA, USA. ®Prion Alliance, Cambridge, MA, USA. °Wellcome 
Sanger Institute, Hinxton, Cambridgeshire, UK. '°National Heart and Lung Institute and MRC London Institute of Medical Sciences, Imperial College London, London, UK. "Centre for 
Translational Bioinformatics, William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London and Barts Health NHS Trust, London, 
UK. "School of Basic and Medical Biosciences, Faculty of Life Sciences and Medicine, King’s College London, London, UK. “Blizard Institute, Barts and The London School of Medicine and 
Dentistry, Queen Mary University of London, London, UK. “Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA. '’Present address: Centre for Population 
Genomics, Garvan Institute of Medical Research and UNSW Sydney, Sydney, Australia. “Present address: Centre for Population Genomics, Murdoch Children’s Research Institute, Melbourne, 
Australia. *Lists of authors and their affiliations appear at the end of the paper. “e-mail: eminikel@broadinstitute.org; d.macarthur@garvan.org.au 


Nature | Vol 581 | 28 May 2020 | 459 


Analysis 


Drug All 
targets 


genes 
: mean = 52% 


mean = 44% a 


Proportion genes (%) 
3 
| 


0 25 50 75 100+ 
pLoF obs/exp ratio (%) 
b All genes e 
Olfactory receptors : 
Homozygous LoF tolerant : eo 
Autosomal recessive ~@ 
Comparators Autosomal dominant eo - 
Essential in culture e : 
ClinGen haploinsufficient e ‘ 
All drug targets @. 
Positive -O- 
By effect Negative eo: 
Other and unknown -o-: 
| [ | | 
0 25 50 75 100 
pLoF obs/exp ratio (%) 
c 
_~ PCSK9_ Cholesterol-lowering antibodies 
& 1005 ea ACE Angiotensin-converting enzyme inhibitors 
oO ae fn : 
yes) ee HRH1_ H1 antihistamines 
s -____ P2RY12 Antiplatelets 
ca 50-| «——— ATP4A_ Proton pump inhibitors 
G 
rr e«————— DHFR. Antifolates 
Q 45_| “~~~ PDE5A_ Phosphodiesterase 5 inhibitors Haplo-_ 
LL Ee een ee re iy I ee oe emer ee ie Neen EE rcs insufficient 
[o} 
at 
lon 


<5 HMGCR Statins gene mean 
o- Noe PTGS2_ Non-steroidal anti-inflammatory drugs 
TUBB Cytoskeleton disruptors 


CHRM1_ M1-selective antimuscarinics 
TOP1 Topoisomerase | inhibitors 


Fig. 1| pLoF constraint in drug targets. a, Histogram of pLoF obs/exp values 
for all genes (black, n=17,604) versus drug targets (blue, n=383). b, Forest plot 
of means (dots) and 95% confidence intervals of the mean (line segments), for 
constraint in the indicated gene sets (data sources and n values in Extended 
Data Table 1). For drug effect, ‘positive’ indicates agonist, activator or inducer, 
whereas negative indicates antagonist, inhibitor or suppressor, for example. 
c, Examples of drug targets and corresponding drug classes from across the 
constraint spectrum. Details in Extended Data Table 2. 


Building on these insights, here we leverage pLoF variation in the 
Genome Aggregation Database (gnomAD)’ v2 dataset of 141,456 indi- 
viduals to answer open questions in the interpretation of human pLoF 
variation in disease biology and drug development. 


Constraint in human drug targets 


We compared constraint in the targets of approved drugs extracted 
from DrugBank” (n=383) versus all protein-coding genes (n=17,604). 
Drug targets were, on average, just slightly more constrained than all 
genes (mean 44% versus 52%, nominal P=0.00028, D=0.11, two-sided 
Kolmogorov-Smirnov test), but the two gene sets had a qualitatively 
similar distribution of scores, ranging from intensely constrained 
(0% obs/exp) to not at all constrained (=100% obs/exp) (Fig. 1a). Con- 
straint scores showed clear divergence between categories of genes 
(Extended Data Table 1) expected to be more or less tolerant of inacti- 
vation (Fig. 1b), as previously reported”, validating the usefulness of 
constraint as a measure of gene essentiality. Nonetheless, when drug 
targets were stratified by drug effect (Fig. 1b), modality, or indication 
(Extended Data Fig. 1), no statistically significant differences between 
subsets of drug targets were observed. 

The slightly but significantly lower obs/exp value among drug targets 
may superficially appear to provide evidence that constrained genes 
make superior drug targets. Stratification of drug targets by protein 
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family, human disease association, and tissue expression, however, 
argues against this interpretation. Drug targets are strongly enriched 
for a few canonically ‘druggable’ protein families, for genes known 
to be involved in human disease, and for genes with tissue-restricted 
expression; each of these properties is in turn correlated with either 
significantly stronger or weaker constraint (Extended Data Fig. 2). 
Although controlling for these correlations does not abolish the trend 
of stronger constraint among drug targets, the correlation of so many 
observed variables with the status of a gene as a drug target argues that 
many unobserved variables probably also confound interpretation of 
the lower mean obs/exp value among drug targets. 

The overall constraint distribution of drug targets (Fig. 1a) also 
argues against the view that a gene in which LoF is associated witha 
deleterious phenotype cannot be successfully targeted. Indeed, 19% 
of drug targets (n= 73), including 52 targets of inhibitors, antagonists 
or other ‘negative’ drugs, have lower obs/exp values than the average 
(12.8%) for genes known to cause severe diseases of haploinsufficiency'® 
(ClinGen level 3). To determine whether this finding could be explained 
bya particular class or subset of drugs, we examined constraint in sev- 
eral well-known example drug targets (Fig. 1c, Extended Data Table 2). 
Some heavily constrained genes are targets of cytotoxic chemotherapy 
agents such as topoisomerase inhibitors or cytoskeleton disruptors, 
aset of drugs intuitively expected to target essential genes. However, 
genes with near-complete selection against pLoF variants also include 
HMGCR and PTGS2, the targets of highly successful, chronically used 
inhibitors—statins and aspirin. 

These human in vivo data further the evidence from other species 
and models that essential genes can be good drug targets. Homozy- 
gous knockout of Hmgcr and Ptgs2 are lethal in mice’’. Drug targets 
exhibit higher inter-species conservation than other genes”. Targets 
of negative drugs include 14 genes with lethal heterozygous knockout 
mouse phenotypes reported” and 6 reported as essential in human 
cell culture. 


Prospects for finding human ‘knockouts’ 


Athough constraint alone is not adequate to nominate or exclude 
drug targets, the study of individuals with single hit (heterozygous) 
ortwo-hit (‘knockout’) LoF genotypes ina gene of interest can be highly 
informative about the biological effect of engaging that target’. To 
assess prospects for ascertaining knockout individuals, we computed 
the cumulative allele frequency (CAF) of pLoF variants in each gene 
(Methods), and then used this to estimate the expected frequency of 
two-hit individuals under different population structures (Fig. 2) in 
the absence of natural selection. 

Whereas gnomAD is now large enough to include at least one pLoF 
heterozygote for most (15,317 out of 19,194; 79.8%) genes, ascertain- 
ment of total knockout individuals in outbred populations will require 
1,000-fold larger sample sizes for most genes: the median expected 
two-hit frequency of a gene is just six per billion (Fig. 2a). Evenif every 
human on Earth were sequenced, there are 4,728 genes (24.6%) for 
which identification of even one two-hit individual would not be 
expected in outbred populations. Intuitively, because the sample size 
of gnomAD today is larger than the square root of the world population, 
variants so far seen in zero or only a few heterozygous individuals are 
not likely to ever be seen in a homozygous state in outbred popula- 
tions, except where variants prove common in populations not yet 
well-sampled by gnomAD. 

Because population bottlenecks can result in very rare variants 
present in a founder rising to an unusually high frequency, we also 
considered knockout discovery in bottlenecked populations, using 
Finnish individuals in gnomAD as an example®. Although this popula- 
tion structure can enable well-powered association studies for the small 
fraction of genes in which pLoF variants drifted to high frequency due 
to the bottleneck, overall, identification of two-hit pLoF individuals 
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Fig. 2| Prospects for discovery of human knockouts. a-c, Histograms (a-c): 
genes by expected heterozygote frequency (orange), and two-hit homozygote and 
compound heterozygote frequency (purple). a, Outbred populations. b, Finnish 
individuals; an example of a bottlenecked population. c, Consanguineous 
individuals. d, Current status of pLoF or disease association discovery for all 
protein-coding genes. e, Projected sample sizes required for discovery of two-hit 
individuals (solid lines) and for statistical inference that a two-hit genotype is lethal 
ifno such individuals are observed (dashed lines), for ‘pLoF observed in gnomAD’ 
genes (d) for consanguineous and outbred individuals. 


for a pre-specified gene of interest appears equally or more difficult 
in Finnish individuals than in outbred populations (Fig. 2b, Extended 
Data Fig. 3), because rare variants not present in a founder have been 
effectively removed from the population. 

Inconsanguineous individuals, parental relatedness greatly increases 
the frequency of homozygous pLoF genotypes. Then =2,912 individuals 
in the East London Genes & Health (ELGH) cohort” who report hav- 
ing parents who are second cousins or closer have on average 5.8% of 
their genomes autozygous. Here, the expected frequency of two-hit 
individuals is many times higher than in outbred populations, at five 
per million for the median gene (Fig. 2c). 

These projections allow us to draft aroadmap for discovery of human 
knockouts across 19,194 genes (Fig. 2d, e). Online Mendelian Inheritance 
in Man (OMIM) already describes human disease association for 3,367 
genes (18%), although the discovery of LoF individuals in population 
databases will still be valuable for assessing penetrance and identifying 
LoF syndromes of known gain-of-function genes. Another 3,421 genes 
(18%) without known human disease association have two-hit pLoF 
genotypes reported in gnomAD’, ELGH”*, PROMIS”,, deCODE”* or UK 
Biobank”’, which suggests that this genotype may be tolerated. An addi- 
tional 2,190 genes (11%) appear intolerant of heterozygous inactivation 
(pLI score > 0.9) in gnomAD-—a set expected to be enriched for genes 
with severe heterozygous and lethal homozygous LoF phenotypes. 


Another 2,781 genes (14%) have no pLoF variants observed in gnomAD, 
but our sample size is not yet large enough to robustly infer LoF intol- 
erance. For these genes, observation of outbred two-hit individuals is 
not expected, and we cannot yet assess the feasibility of identifying 
consanguineous two-hit individuals because we lack an estimate of 
pLoF allele frequency. 

This leaves 7,435 genes (39%) for which one or more pLoFs are 
observed in gnomAD, but strong LoF intolerance cannot be deter- 
mined, two-hit genotypes have not been observed, anda human disease 
phenotype is not known. We projected the sample sizes required to 
identify knockout individuals for these genes (Fig. 2e). Inoutbred popu- 
lations, current sample sizes would need to increase by approximately 
1,000-fold before ascertainment of a single two-hit LoF individual 
would be expected for the typical gene. By contrast, around a10- to 
100-fold increase from current consanguineous sample size, meaning 
hundreds of thousands of individuals in absolute terms, would identify 
at least one two-hit LoF individual for the typical gene. Among other 
simplifying assumptions (Methods), these projections presume that 
complete knockout is tolerated. When only one or a few two-hit indi- 
viduals are expected in a dataset, the absence of any such individuals 
can be due to either early lethality, a severe clinical phenotype incom- 
patible with inclusion in gnomAD, or simply chance. Thus, the ability 
toinfer lethality of the two-hit genotype based on statistical evidence 
will lag behind the identification of two-hit individuals where they 
do exist (Fig. 2e). For some genes, inference of lethality will always 
remain impossible in outbred populations, though it may be feasible 
in consanguineous individuals. 


Curation of pLoF variants 


Where pLoF variants can be identified, they are a valuable resource for 
assessing the effect of lifelong reduction in gene dosage. To highlight 
the challenges and opportunities of identifying such variants, we manu- 
ally curated gnomAD data and the scientific literature for six genes 
associated with gain-of-function (GoF) neurodegenerative diseases, 
for which inhibitors or suppressors are under development” *: HTT 
(Huntington's disease), MAPT (tauopathies), PRNP (prion disease), 
SOD1 (amyotrophic lateral sclerosis), and LRRK2 and SNCA (Parkinson's 
disease). The results (Fig. 3, Extended Data Table 3) illustrate four points 
about pLoF variant curation. 

First, other things being equal, genes with longer coding sequences 
offer more opportunities for LoF variants to arise, and so tend to 
have a higher cumulative frequencies of LoF variants, unless they are 
heavily constrained. Ascertainment of LoF individuals is thus harder 
for shorter and/or more constrained genes, even though these may 
be good targets. 

Second, many variants annotated as pLoF are false positives®, and 
these are enriched for higher allele frequencies, so that both filtering 
and curation have an outsized effect on the cumulative allele frequency 
of LoF. Studies of human pLoF variants lacking stringent curation can 
therefore easily dilute results with false pLoF carriers. 

Third, after careful curation, cumulative LoF allele frequency is 
sometimes sufficiently high to place certain bounds on what heterozy- 
gote phenotype might exist. For example, GoF mutations causing 
genetic prion disease have a genetic prevalence of approximately 1 
in 50,000* and have been known for three decades, with thousands 
of cases identified, making it unlikely that a comparably severe and 
penetrant haploinsufficiency syndrome associated with PRNP would 
have gone unnoticed to the present day despite being more than twice 
as common (roughly 1in 18,000). Similar arguments can be made for 
HTT, LRRK2 and SOD1 genes (Extended Data Tables 3, 4). Of course, 
this does not rule out a less severe or less penetrant heterozygous 
LoF phenotype. 

Finally, careful inspection of the distributions of pLoF variants 
can reveal important error modes or disease biology. HTT, MAPT 
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Fig. 3 | Insights from non-random positional distributions of pLoF variants. 
a-c, HTT (a), MAPT, with brain expression data from GTEx” (b) and PRNP, a 
single protein-coding exon with domains removed by post-translational 
modification in grey (c), showing previously reported variants“ and those 
newly identified in gnomAD and in the literature (Extended Data Table 5). GPI, 
glycosylphosphatidylinositol. Detailed variant curation results are provided in 
Supplementary Table 1. 


and PRNP genes each have different non-random positional distri- 
butions of pLoF variants (Fig. 3). High-frequency H7T pLoF variants 
cluster in the polyglutamine/polyproline repeat region of exon 1 and 
appear to be alignment artefacts (Fig. 3a). True HTT LoF variants are 
rare and the gene is highly constrained, which might suggest some 
fitness effect in a heterozygous state in addition to the known severe 
homozygous phenotype’, although the frequency of LoF carriers 
still argues against a penetrant syndromic illness, consistent with 
the lack of phenotype reported in heterozygotes identified so far*®”. 
High-frequency MAPT pLoF variants cluster in exons not expressed in 
the brain in GTEx data"®, and all remaining pLoFs appear to bealign- 
mentor annotation errors (Fig. 3b). No true LoFs are observed in MAPT, 
although our sample size is insufficient to prove that MAPT LoF is not 
tolerated—among constitutive brain-expressed exons, we expect 12.6 
LoFs and observe 0, giving a 95% confidence interval upper bound of 
23.7% for obs/exp values. PRNP-truncating variants in gnomAD cluster 
in the N terminus; the sole C-terminal truncating variant in gnomAD 
is a dementia case (Extended Data Table 5), consistent with variants 
at codon 2145 causing a pathogenic gain-of-function through change 
in localization (Fig. 3c). Within codons 1-144, PRNPis unconstrained 
(Extended Data Table 3), and no neurological phenotype has been 
identified in individuals with truncating variants so far, consistent 
with the hypothesis that N-terminal truncating variants are true LoF 
and are tolerated in a heterozygous state“. 
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Discussion 


Studying human gene inactivation can illuminate human biology 
and guide the selection of drug targets, complementing mouse knockout 
studies”, but analysis of any one gene requires genome-wide context to 
set expectations and guide inferences. Here we have used gnomAD data 
to provide context to aid in the interpretation of human LoF variants. 

Targets of approved drugs range from highly constrained to completely 
unconstrained. There may beseveral reasons why some genes apparently 
tolerate pharmacological inhibition but not genetic inactivation. LoF 
variants in constitutive exons should affect all tissues for life, whereas 
drugs differ in tissue distribution and timing and duration of use. Many 
drugs known or suspected to cause fetal harm are tolerated in adults*, 
and might target developmentally important genes. Constraintis thought 
to primarily reflect selection against heterozygotes”, the effective gene 
dosage of which may differ from that achieved by a drug. Constraint meas- 
ures natural selection over centuries or millennia; the environment of our 
ancestors presented different selective pressures from what we face today. 
Actions of small-molecule drugs may not map one-to-one onto genes*™. 
Regardless, these human in vivo data show that even a highly deleterious 
knockout phenotype is compatible with a gene being a viable drug target. 

For most genes, the lack of total knockout individuals identified so 
far does not yet provide statistical evidence that this genotype is not 
tolerated. Indeed, for many genes, such evidence may never be attain- 
able in outbred populations. Bottlenecked populations, individually, 
are unlikely to yield two-hit individuals for a pre-specified gene of inter- 
est, although the sequencing of many different, diverse bottlenecked 
populations will certainly expand the set of genes accessible by this 
approach. Identification of two-hit individuals will be most greatly 
aided by increased investment in consanguineous cohorts, in which the 
sample size required for any given gene is often orders of magnitude 
lower than in outbred populations. Our analysis is limited by sample 
size, insufficient diversity of sampled populations, and simplifying 
assumptions about population structure and distribution of LoF vari- 
ants, so our calculations should be taken as rough, order-of-magnitude 
estimates. Nonetheless, this strategic roadmap for the identification 
of human knockouts should inform future research investments and 
rationalize the interpretation of existing data. 

Recall-by-genotype efforts are only valuable if the variants in ques- 
tion are correctly annotated. Automated filtering’ and transcript 
expression-aware annotation" are powerful tools, but we demonstrate 
the continued value of manual curation for excluding further false posi- 
tives, assessing and interpreting the cumulative allele frequency of true 
LoF variants, and identifying error modes or biological phenomena that 
give rise to non-random distributions of pLoF variants across a gene. 
Such curation is essential before any recontact efforts, and establish- 
ing methods for high-throughput functional validation*® of LoF vari- 
ants is a priority. Our curation of pLoF variants in neurodegenerative 
disease genes is limited by a lack of functional validation and detailed 
phenotyping; acompanion paper demonstrates a deeper investigation 
of the effects of LoF variants in the LRRK2 gene”. 

Drug development projects may increasingly be accompanied by 
efforts to phenotype human carriers of LoF variants. With the cost of 
drug discovery driven overwhelmingly by failure®’, successful interpre- 
tation of LoF data to select the right targets and right clinical pathways 
will yield outsize benefits for research productivity and, ultimately, 
human health. 
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Methods 


No statistical methods were used to predetermine sample size. The 
experiments were not randomized, and investigators were not blinded 
to allocation during experiments and outcome assessment. 


Data sources 

pLoF analyses used the gnomAD dataset of 141,456 individuals’. For data 
consistency, all genome-wide constraint and CAF analyses used only the 
125,748 gnomAD exomes. Curated analyses of individual genes used all 
141,456 individuals including 15,708 whole genomes. Gene lists used in 
this study were extracted from public data sources between September 
2018 and June 2019. Data sources and criteria for gene list extraction 
are shown in Extended Data Table 1. This study was performed under 
ethical approval from the Partners Healthcare Institutional Research 
Board (2013P001339/MGH) and the Broad Institute Office of Research 
Subjects Protection (ORSP-3862). All research participants provided 
informed consent. 


Calculation of pLoF constraint 

The calculation of constraint values for genes has been described in 
general elsewhere!” and for this dataset specifically by Karczewski 
et al.’. Constraint calculations used LOFTEE-filtered (‘high confi- 
dence’) single-nucleotide variants (which for pLoF means nonsense 
and essential splice site mutations) found in gnomAD exomes with 
minor allele frequency <0.1%. Only unique canonical transcripts for 
protein-coding genes were considered, yielding 17,604 genes with 
available constraint values. For curated genes (Extended Data Table 2), 
the number of observed variants passing curation was divided by the 
expected number of variants to yield a curated constraint value. For 
PRNP, the expected number of variants was adjusted by multiplying 
by the ratio of the sum of mutation frequencies for all possible pLoF 
variants in codons 1-144 to the sum of mutation frequencies for all 
possible pLoF variants in the entire transcript, yielding 6 observed 
out of 6.06 expected. For MAPT, the expected number of variants was 
taken from Ensembl transcript ENSTO000003342339, which includes 
only the exons identified as constitutively brain-expressed in Fig. 3b 
(exon numbering previously described”). 


Calculation of pLoF heterozygote and homozygote/compound 
heterozygote frequencies 

LOFTEE-filtered high-confidence pLoF variants with minor allele fre- 
quency <5% in 125,748 gnomAD exomes were used to compute the 
proportion of individuals without a loss-of-function variant (q); the 
CAF was computed as p = 1- sqrt(q). This approach conservatively 
assumes that, ifan individual has two different pLoF variants, they are 
in cis to each other and count as only one pLoF allele. 

For outbred populations (Fig. 2a), we used the value of p from all 
125,748 gnomAD exomes, as this allows the largest possible sample 
size. This includes some individuals from bottlenecked populations, 
for which the distribution of p does differ from outbred populations, 
but these individuals are a small proportion of gnomAD exomes (12.6%). 
This also includes some consanguineous individuals, but these are an 
even smaller proportion of gnomAD exomes (2.3%), and any difference 
in the value of p between consanguineous and outbred populationsis 
expected to be very small. Heterozygote frequency was calculated as 
2p(1-p) and homozygote and compound heterozygote frequency was 
calculated as p’. Lines indicate the size of gnomAD (141,456 individuals) 
and the world population (6.69 billion). 

For bottlenecked populations (Fig. 2b), we used the value of p from 
the 10,824 Finnish exomes only. Lines indicate the number of Finn- 
ish individuals in gnomAD (12,526) and the population of Finland 
(5.5 million). 

For consanguineous individuals (Fig. 2c), we again used the value of 
pfromall gnomAD exomes, because pis not expected to differ greatly 


in consanguineous versus outbred populations. We used the mean 
proportion of the genome in runs of autozygosity (a) from individuals 
self-reporting second cousin or closer parents in East London Genes 
& Health, a= 0.05766 (rounded to 5.8%). Heterozygote frequency was 
calculated as 2p(1- p) and homozygote and compound heterozygote 
frequency was calculated as (1- a)p’ + ap. Lines indicate the number 
of consanguineous South Asian individuals in gnomAD (n = 2,912, by 
coincidence the same number as report second cousin or closer parents 
in ELGH) based on F > 0.05 (a conservative estimate, because second 
cousin parents are expected to yield F= 0.015625), and the estimated 
number of individuals in the world with second cousin or closer parents 
(10.4% of the world population)’. 

Several caveats apply to our CAF analysis. First, our approach naively 
treats genes with no pLoFs observed as having P= 0, even though pLoFs 
might be discovered at a larger sample size. Second, we naively group 
all populations together, even though the distribution of populations 
sampledin gnomAD does not reflect the world population’; we believe 
that this is reasonable because CAF for many genesis driven by single- 
tons and other ultra-rare variants for which frequency is not expected to 
differ appreciably by continental population”. (It is important to note 
that the histograms shown in Fig. 2 reflect the expected frequency of 
heterozygotes and homozygotes/compound heterozygotes, based on 
gnomADallele frequency, rather than the actual observed frequency 
of individuals with these genotypes in gnomAD.) Third, we use only 
protein-truncating variants annotated as pLoF in gnomAD. Structural 
and non-coding variation resulting in aloss of function may be missed 
inexomes, and missense variants resulting ina loss of function cannot 
be rigorously annotated. Fourth, we naively treat genes with one pLoF 
allele observed as having P= 1/(2 x 125,748), even though on average 
singleton variants havea true allele frequency lower than their nominal 
allele frequency”. Fifth, the variants included in this analysis are filtered 
but have not been manually curated or functionally validated, so some 
will ultimately prove not to be true LoF. These false positives tend to 
be more common and will have disproportionately contributed tothe 
cumulative LoF allele frequency. Sixth, as described in the main text, 
our calculations assume that complete knockout is tolerated, which 
will not be true for some genes. We therefore also include a projection 
of the sample size needed to infer lethality from the absence of two-hit 
knockout individuals (Fig. 2e). Points one to three will tend to lead to 
underestimation of the true complete knockout frequency, whereas 
points four to six will tend to lead to overestimation. On balance, our 
calculations may reflect an upper bound of complete knockout fre- 
quency for most genes owing to the strong influence of factors five and 
six. Finally, as a matter of comparison between population structures, 
the sample size for all gnomAD exomes (Fig. 2a, c) is larger than for 
only Finnish exomes (Fig. 2b). For a version of Fig. 2 with the global 
gnomAD population downsampled to the same sample size as the 
gnomAD Finnish population, see Extended Data Fig. 2. 


Knockout roadmap 

For the knockout ‘roadmap’ (Fig. 2d, e), we classified genes according 
to the current status of human disease association and LoF ascertain- 
ment. Genes were classified as having a Mendelian disease association 
if they were present in OMIM with the filters described in Extended 
Data Table 1. 

Remaining genes were classified as ‘2-hit LoF reported’ based on 
presence in one or more of the following gene lists: homozygous 
LoF genotypes in gnomAD curated as previously described’; filtered 
homozygous LoF genotypes in runs of autozygosity with minor allele 
frequency <1% in canonical transcripts in the Bradford, Birmingham 
and ELGH” cohorts (total n = 8,925); observed number of imputed 
homozygotes >1 or number of compound heterozygous carriers where 
minor allele frequency <2% (for both variants) in deCODE”’; homozy- 
gous LoF reported in PROMIS”’; homozygous LoF with minor allele 
frequency <1% in UK Biobank”. 


Analysis 


The remainder of genes were sequentially classified as ‘likely haplo- 
insufficient’ if pLI>0.9 in gnomAD, ‘pLoF not yet observed’ if CAF = Oin 
gnomAD, and, finally, ‘pLoF observed in gnomAD’ if CAF >0 in gnomAD. 


Genetic prevalence estimation 

Here, we define ‘genetic prevalence’ for a given gene as the proportion 
of individuals in the general population at birth who have a patho- 
genic variant in that gene that will cause them to later develop disease. 
Genetic prevalence has not been well-studied or estimated for most 
disease genes. 

In principle, it should be possible to estimate genetic prevalence 
simply by examining the allele frequency of reported pathogenic vari- 
ants ingnomAD. In practice, three considerations usually preclude this 
approach. First, the present gnomAD sample size of 141,456 exomes 
and genomesis still too small to permit accurate estimates for very rare 
diseases. Second, the mean age of gnomAD individuals is approximately 
55, whichis above the age of onset for many rare genetic diseases, and 
individuals with known Mendelian disease are deliberately excluded, 
so pathogenic variants will be depleted in this sample relative to the 
whole birth population. Third and most importantly, a large fraction of 
reported pathogenic variants lack strong evidence for pathogenicity 
and are either benign or low penetrance, so without careful cura- 
tion of pathogenicity assertions, summing the frequency of reported 
pathogenic variants in gnomAD will in most cases vastly overestimate 
the true genetic prevalence of a disease. 

Instead, we searched the literature and very roughly estimated 
genetic prevalence based on available data. In most cases, we took 
disease incidence (new cases per year per population), multiplied by 
proportion of cases due to variants ina gene of interest, and multiplied 
by average age at death in cases. In some cases, estimates of at-risk 
population or direct measures of genetic prevalence were available. 
Details of the calculations undertaken for each gene are provided in 
Extended Data Table 4. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


The gnomAD v2 data are available via the gnomAD browser (https:// 
gnomad.broadinstitute.org). 


Code availability 


Additional data and the R 3.5.1 and Python 2.7.10 source code for 
this study are available via GitHub (https://github.com/ericminikel/ 
drug _target_lof). 
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Extended Data Fig. 1| Drug target constraint by modality and indication. 
Mean (dots) and 95% confidence interval (line segments) for constraint in 
subsets of drug-targets sets (data sources and number of genes for each list are 
provided in Extended Data Table 1). Modality information was extracted from 
DrugBank and indication information from ATC codes; see Extended Data 
Table 1 for details. 


all genes e 
all drug targets 2: 
rhodopsin-like GPCRs (0) — 
protein ion channels Oe 
family nuclear receptors a> : 
@: 400 
enzymes : 300 
human GWAS hits oe: 200 
disease : 100 
association OMIM genes @- 0 
tissues with ah oF : 
expression some 0.@ 
>1 TPM none fe) — 400 
| r | 300 
200 
0% 25% 50% 75% 100% 64 16 4 1 4 16 64 100 
pLoF obs/exp ratio fold depletion fold enrichment 0 


Extended Data Fig. 2 | Drug-target gene set confounding. a, Forest plot of 


means (dots) and 95% confidence intervals of the mean (line segments) for gene 


sets evaluated for confounding with drug-target status. Datasources and 
number of genes for each list are provided in Extended Data Table 1. LoF 
obs/exp ratios differ significantly from the set of all genes for four canonically 
druggable protein families (top), human disease-associated genes (middle), 
and genes by broadness of tissue expression (bottom). Within each class, 

the genes that are drug targets have alower mean obs/exp ratio (hollow 

grey circles) than the class overall. b, The druggable protein families, 
disease-associated genes, and genes expressed in some tissues but not others 
are enriched several-fold among the set of drug targets. Bars indicate fold 


enrichment and error bars indicate 95% confidence intervals. 

c-e, Composition of drug targets when broken down by protein family (c), 
disease association (d), or broadness of tissue expression (e). The enriched 
classes account for most drug targets. Ina linear model, after controlling for 
protein family, disease association status, and number of tissues with 
expression >1 transcript per million (TPM), drug targets are still more 
constrained than other genes (-8.0% obs/exp, nominal P= 0.00011, t=-3.9, 
df=17,325 for the contribution of drug target inthe linear regression 
obs/exp ~ drug _target + family + dz_assoc +n_tissues), but the probable 
existence of additional unobserved confounders cautions against 
over-interpretation of this observation (see main text). 
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Extended Data Fig. 3 | Expected frequency of individuals with one or two 
nullalleles for every protein-coding gene across different population 
models, with sample size held constant. This is identical to Fig. 2 except as 
follows. As noted in the Methods, one caveat about Fig. 2 is that the sample size 
is larger for the plots using all gnomAD exomes (Fig. 2a, c) than for Finnish 
exomes (Fig. 2b). This figure shows the same analysis, but with the global 
gnomAD population downsampled to 10,824 randomly chosen exomes so that 
the same size is identical to that of Finnish exomes. Computation of 
P=1-sqrt(q) as described in the Methods is computationally expensive for 
downsampled datasets because it requires individual-level genotypes. Instead, 
this analysis uses ‘classic’ CAF, whichis simply the sum of allele frequencies of 
all high-confidence pLoF variants each at allele frequency <5%, capped ata 
total of 100%, for both global and Finnish exomes. The results show that even 
when the sample size is held constant, the number of genes with zero pLoF 
variants observed is higher ina bottlenecked population than ina mostly 
outbred population. A constanty axis with no axis breaks is used in this figure 
to make this difference more clearly visible. 


Extended Data Table 1| Data sources for gene lists used in this study 


list N N* reference / criteria 

All 19,194 17,604 HGNC. 

Olfactory receptors 371 325 Mainland et al®. 

Homozygous LoF 330 325 22 different high-confidence pLoF variants each homozygous 

tolerant in 21 individual in gnomAD exomes. 

Autosomal recessive 527 519 Blekhman et al®. 

Autosomal dominant 307 305 Blekhman et al®. 

Essential in culture 683 659 Hart et al?*. 

ClinGen 294 288 ClinGen Dosage Sensitivity Map"? level 3 

haploinsufficient 

Approved drug 386 383 DrugBank 5.0 XML release’” (acc. Sep 12, 2018); Top-ranked 

targets mechanistic target of approved drugs. group==’approved’, 
target.attrib[‘position’] == '1', known-action==’yes’ 

Positive targets 143 142 DrugBank action: activator, agonist, chaperone, cofactor, 


gene replacement, inducer, partial agonist, positive allosteric 
modulator, positive modulator, potentiator, stimulator 


Negative targets 243 241 DrugBank action: antagonist, blocker, degradation, inhibitor, 
inverse agonist, negative modulator, neutralizer, suppressor 

Other & unknown 94 94 DrugBank action other or unlisted. 

(effect) 

Small molecule 176 175 DrugBank type == ‘small’ 

Antibody 18 18 DrugBank type == ‘biotech’ and ‘Antibodies’ in categories 

Other (modality) 35 35 DrugBank type == ‘biotech’ and ‘Antibodies’ not in categories 

Oncology 45 45 ATC level 1 code L 

Cardiovascular 38 38 ATC level 1 code C 

Endocrine 24 24 ATC level 1 code G or H 

Metabolic & 38 38 ATC level 1 code A 

alimentary 

Neurology 35 35 ATC level 1 code N 

Respiratory 12 11 ATC level 1 code R 

Skeletomuscular 14 14 ATC level 1 code M 

Other (indication) 29 28 ATC level 1 code B, D, J, P, S, or V 

Rhodopsin-like 689 604 HGNC* gene set 140: “G protein-coupled receptors, Class A 

GPCRs rhodopsin-like”. 

lon channels 326 323 HGNC gene set 177: “lon channels”®?. 

Nuclear receptors 48 47 IUPHAR/BPS Guide to Pharmacology “Nuclear receptors”® . 

Enzymes 1,178 1,144 IUPHAR/BPS Guide to Pharmacology “Enzymes”. 

GWAS hits 6,336 6,080 GWAS Catalog®® MAPPED_GENE column (P < 5-e8) 

OMIM genes 3,367 3,294 OMIM? (acc. June 11, 2019) phenotypes with MIM number, 
lacking ‘?’, ‘{‘, ‘[', “response”, “susceptibility”, or “somatic”. 

All (tissues) 7,931 7,550 >1 TPM in all 53 tissues in GTEx*? v7 

Some (tissues) 9698 9,009 >1TPMin>0 and <53 tissues in GTEx’? v7 

None (tissues) 1,076 776 >1 TPM in 0 tissues in GTEx*? v7 


Mouse heterozygous 401 395 MouseMine?? 
lethal knockout 


For analysis, only protein-coding genes with unambiguous mapping to current approved gene symbols were used; numbers in the table reflect this. Values in the N column indicate totals from 
the full universe of 19,194 genes; values in the N* column indicate the subset of genes with non-missing constraint values, used for Fig. 1 and Extended Data Figs. 1, 2. The following references 


are cited in the table: refs, 77974405797, 
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Extended Data Table 2 | Spectrum of tolerance to genetic inactivation among human drug targets 


drug class 
topoisomerase | inhibitors 
M1-selective antimuscarinics 
cytoskeleton disruptors 
non-steroidal anti-inflammatory drugs 
statins 
phosphodiesterase 5 inhibitors 
antifolates 
proton pump inhibitors 
antiplatelets 
H1 antihistamines 
angiotensin converting enzyme inhibitors 
cholesterol-lowering antibodies 


example 
irinotecan 
pirenzepine 
paclitaxel 
aspirin 
atorvastatin 
sildenafil 
methotrexate 
omeprazole 
clopidogrel 
cetirizine 
benazepril 
alirocumab 


Example targets are arranged from the most intolerant (top) to the most tolerant (bottom) of inactivation. 


gene 
TOP1 
CHRM1 
TUBB 
PTGS2 
HMGCR 
PDE5A 
DHFR 
ATP4A 
P2RY12 
HRH1 
ACE 
PCSK9 


obs/exp pLoF 
0% (0/50.5) 
0% (0/14.1) 
6% (1/16.4) 
10% (3/29.7) 
13% (6/46.3) 
33% (16/47.8) 
38% (4/10.5) 
52% (25/47.9) 
66% (5/7.6) 
76% (11/14.5) 
87% (62/71.3) 
98% (26/26.5) 


Extended Data Table 3 | Curation of pLoF variation in six neurodegenerative disease genes 


HTT 9,426 8.2% 6.2% 0.013% 1 in 3,800 1 in 2,400-4,400 
LRRK2_ 7,581 41% 0.23% 0.09% 1 in 500 1 in 3,300 
MAPT 2,328 0%? 14% 0% — 1 in 5,000 — 31,000 
PRNP 759 99%? 0.0035% 0.0021% 1 in 18,000 1 in 50,000 
SNCA 420 0% 0.0012% 0% — 1 in 360,000 
SOD1 462 18% 0.0060% 0.0038% 1 in 26,000 1 in 27,000-83,000 


Shown are the coding sequence length (base pairs, bp), constraint value (pLoF obs/exp) after filtering and curation, cumulative allele frequency before and after filtering and manual curation, 
estimated frequency of true pLoF heterozygotes in the population, and genetic prevalence (population frequency including pre-symptomatic individuals) of the GoF disease associated with 
the gene. Genetic prevalence calculations are described in Extended Data Table 4, and variant curation details are provided in Supplementary Table 1, except for LRRK2, which is described in 
detail elsewhere’’. 

*Constitutive brain-expressed exons only. 

®PRNP codons 1-144; see Fig. 3c for rationale. 
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Extended Data Table 4 | Estimation of genetic prevalence for GoF genetic neurodegenerative diseases 


gene basis for genetic prevalence estimation estimate 


HTT A reported HD incidence of 0.38 cases per 100,000 per year based on meta- 1 in 4,386 
analysis®® multiplied by an average age at death of ~60 for the most common 
CAG lengths*®. Finally, a genetic screen of a general population sample® found 
240 CAG repeat alleles, which are presumed to be fully penetrant, in 3 
individuals out of 7,315, for a genetic prevalence of 1 in 2,438. 
HTT Prevalence of 13.7 per 100,000 symptomatic plus 81.6 per 100,000 at 25-50% 1 in 2,451 
risk in an exhaustive ascertainment study®'. Assuming there are twice as many 
individuals at 25% risk as at 50% risk, then on average 33.3% of the 81.6, or 
27.1 per 100,000 have the mutation. Thus, 13.7 + 27.1 = 40.8 per 100,000 
individuals have an HTT CAG expansion. 


HTT A genetic screen of a general population sample®? found 240 CAG repeat 1 in 2,438 
alleles, which are presumed to be fully penetrant, in 3 individuals out of 7,315. 
LRRK2_ Based on meta-analysis®*, Parkinson’s disease (PD) has an estimated 1 in 3,300 


prevalence of 1,903 per 100,000 at age 280, meaning the general population’s 
lifetime risk of PD is ~1.9%. It is generally stated that about 10% of PD cases 
are “familial” and the remainder sporadic; in a diverse worldwide case series, 
LRRK2 mutations were found in 179/14,253 (1.3%) sporadic cases and 
201/5,123 (3.9%) familial cases®, implying that LRRK2 mutations are present in 
~1.6% of all PD cases. Thus, LRRK2 mutations account for a 1.6% * 1.9% = 
~0.030% lifetime risk of PD in the general population®. 

MAPT Pathogenic MAPT mutations can present with a variety of clinical phenotypes, 1 in 5,000 — 
and common MAPT haplotypes are associated with risk for a variety of different 31,000 
neurodegenerative disorders; we were unable to identify any studies of genetic 
prevalence nor any large case series for any MAPT-associated phenotype. As a 
crude estimate, frontotemporal dementia has a reported incidence of 2.7-4.1 per 
100,000 per year® with typical age at death of perhaps 60, and MAPT mutations 
accounting for 5-20% of familial cases, and familial cases accounting for 40% of 
all cases®. Multiplying all these figures results in range of 0.0032% to 0.020%. 

PRNP | We recently considered the lifetime risk of genetic prion disease in detail®®. Prion 1 in 50,000 
disease (including sporadic, genetic, and acquired) causes ~1 in 5,000 people 
based on either death certificate analysis or division of disease incidence by the 
overall death rate*'®*, ~10% of cases are attributable to PRNP variants with 
evidence for Mendelian segregation (although additional cases harbor lower- 
penetrance variants)*°. Thus, we expect a genetic prevalence of 1 in 50,000. On 
the order of ~1 in 100,000 people in gnomAD and 23andMe harbor high- 
penetrance PRNP variants*®*1, although as noted above, we expect these 
datasets to be depleted compared to the population at birth, because prion 
disease is rapidly fatal and many individuals in these databases are above the 
typical age of onset. 

SNCA _ As explained above for LRRK2, we assumed a 1.9% lifetime risk of Parkinson’s 1 in 360,000 
disease (PD) in the general population, with 10% of cases being familial. SNCA 
point mutations, duplications, and triplications all appear to be highly penetrant, 
and in a familial PD case series these accounted for 103/709 = 15% of 
individuals®’. Thus, we estimate that SNCA mutations account for a 1.9% * 10% 

* 15% = 0.00028% risk of PD in the general population. 

SOD1  SOD1 mutations are believed to account for ~12% to 24% of familial ALS®*:® 1 in 27,000- 
and 1% of sporadic ALS®:”°. One a meta-analysis found that ~4.6% of ALS is 83,000 
familial”’, although a figure of 10% is also often used’*. These figures imply that 
~1.5 — 3.3% of all ALS is attributable to SOD1. The overall incidence of ALS is 
reported at ~1.6 — 2.2 per 100,000 per year’*:*, so the incidence of SOD1 ALS 
might be estimated at ~0.024 — 0.073 per 100,000 per year. Age at death of ~50 
is around average for many SOD1 mutations®, implying a 1.2 — 3.7 per 100,000 
population prevalence of pathogenic SOD1 mutations. 


Data sources were identified by PubMed and Google Scholar searches. Genetic prevalence was defined as the proportion of the population at birth carrying a mutation and destined to later 
develop disease, and estimated as described for each gene. The following references are cited in the table: refs. °64'°°", 

It is important to consider how this figure relates to the penetrance of LRRK2 mutations, as LRRK2 variants appear to occupy a spectrum of penetrance”. Some variants exhibit Mendelian 
segregation with disease”, implying high risk; the G2019S variant is estimated to have approximately 32% penetrance”; and other common variants are risk factors with odds ratios of only 
around 1.2 estimated through genome-wide association studies (GWAS)”. The GWAS-implicated common variants were not included in the case series on which our estimate is based®, 

but G2019S does account for most cases in that series. Because the 0.03% estimate here is based on counting symptomatic cases rather than asymptomatic individuals, it will appropriately 
underestimate the number of G2019S carriers. In essence, in this calculation each G2019S carrier in the population only counts as 1/3 of a person, because they have only a 1/3 probability of 
developing a disease. It is therefore appropriate that our estimate of genetic prevalence (0.03%) is actually lower than double the allele frequency of G2019S in gnomAD (0.1%). 


Extended Data Table 5 | Details of PRNP-truncating variants 


allele neurological 


variant count phenotype comments reference 
G20Gfs84X 1 healthy As previously reported. ai 
R37X 2 healthy, One previously reported, one new. 41 this work 
unknown 
Q41X 1 unknown this work 
H69 False variant calls in gnomAD, apparent 
: 2 N/A alignment artifact due to octapeptide repeat this work 
frameshifts ; 
region. 
Q75X 1 healthy As previously reported oh 
W81X 1 unknown this work 
W99X 1 unknown this work 
The presence of this variant in the ExAC 
database was previously reported, but without 
phenotype information. We now report that this 
individual is a 77-year-old male, cognitively 
G131X 1 healthy well with no family history of dementia. 41 this work 
Ascertained as a case in a study of coronary 
artery disease, this individual has hypertension 
and well-controlled dyslipidemia and has 
undergone one bypass surgery. He has two 
adult children. 
Y145X 1 dementia ot 
Q160X 5 dementia aa 
Y162X 1 dementia = 
Y163X Fé dementia Bae 
Y169X 2 dementia oF 
D178Efs25X 1 dementia ae 
Q186X 1 dementia A 
Y226X 1 dementia 89 
Q227X 1 dementia os 
Ascertained as a female case in the Finnish 
twins Alzheimer disease cohort. Died at 
L234Pfs7X 1 dementia age 7000 proxinal’cause pheumonia, this work 


ultimate cause diagnosed as Alzheimer 
disease based on clinical examination only. 
Had a dizygotic twin not included in gnomAD. 


Allele count for variants from the literature in Fig. 3c is the total number of definite or probable cases with sequencing performed in the studies cited in this table. The L234Pfs7X variant 
changes the C-terminal GPI signal of prion protein from SMVLFSSPPVILLISFLIFLIVGX to SMVPSPLHLX. This new sequence does not adhere to the known rules of GPI anchor attachment®: GPI 
signals must contain a 5-10-polar-residue spacer followed by 15-20 hydrophobic residues. Thus, this frameshifted prion protein would be predicted to be secreted and thus may be pathogenic, 
explaining the Alzheimer’s disease diagnosis in this individual. However, it is also possible that the new C-terminal sequence found here interferes with prion formation, and/or that this variant is 


incompletely penetrant, and that the diagnosis of Alzheimer’s disease in this individual is merely a coincidence. The following references are cited in the table: refs. “"*"®°. 
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Statistics 


For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section. 


n/a | Confirmed 


The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


A description of all covariates tested 


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) 
Lo AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) 


For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted 
— Give P values as exact values whenever suitable. 


For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 


For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 


Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


Our web collection on statistics for biologists contains articles on many of the points above. 


Software and code 


Policy information about availability of computer code 


Data collection Analyses utilized Python 2.7.10 and R 3.5.1. Data and code sufficient to produce the plots and analyses in this paper are available at 
https://github.com/ericminikel/drug_target_lof 


Data analysis Analyses utilized Python 2.7.10 and R 3.5.1. Data and code sufficient to produce the plots and analyses in this paper are available at 
https://github.com/ericminikel/drug_target_lof 


For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. 
We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information. 
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Policy information about availability of data 


All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 
- Accession codes, unique identifiers, or web links for publicly available datasets 
- A list of figures that have associated raw data 
- Adescription of any restrictions on data availability 


Analyses utilized Python 2.7.10 and R 3.5.1. Data and code sufficient to produce the plots and analyses in this paper are available at https://github.com/ericminikel/ 
drug_target_lof 


=) 
fev) 
a 
= 
= 
a) 
= 
a) 
Wn 
a) 
je) 
= 
Oa 
=F 
= 
io) 
I) 
2) 
= 
=) 
© 
Wn 
S 
3 
fev) 
= 
<= 


Field-specific reporting 


Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection. 
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Life sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Sample size This study was opportunistic, and involved secondary use of all available genome and exome data. No sample size was predetermined. Our 
flagship analysis of gnomAD loss-of-function variants (Karczewski et al, https://doi.org/10.1101/531210) indicates that the dataset is well- 
powered to examine constraint against such variants — for instance, 72% of genes have at least 10 pLoF variants expected in this sample size 
based on mutation rates. 


Data exclusions Sample QC and variant QC for the gnomAD database are described extensively by Karczewski et al, https://doi.org/10.1101/531210. Notably, 
individuals with severe pediatric disease, and known first disease relatives of those with severe pediatric disease were excluded. 


Replication We did not attempt to reproduce any findings in a separate dataset, as no other exome or genome sequencing dataset of comparable size 
exists. 


Randomization — As this was a population-based study, and not a case-control study, no randomization was performed. 


Blinding As this was a population-based study, and not a case-control study, blinding was not relevant. 


Reporting for specific materials, systems and methods 


We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, 
system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. 


Materials & experimental systems Methods 

n/a | Involved in the study n/a | Involved in the study 
Antibodies ChIP-seq 
Eukaryotic cell lines Flow cytometry 
Palaeontology MRI-based neuroimaging 
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Animals and other organisms 


Human research participants 


Clinical data 


Human research participants 


Policy information about studies involving human research participants 


Population characteristics As an opportunistic collection of data, the participants in gnomAD were not selected based on age, gender, or genotypic 
information. As described above, individuals with severe pediatric disease, and known first disease relatives of those with severe 
pediatric disease were excluded. The population and dataset inclusion criteria are described in more detail by Karczewski et al, 
https://doi.org/10.1101/531210 


Recruitment The generation of the gnomAD database was an opportunistic secondary use study, we did not recruit any participants. The 
study is described in more detail by Karczewski et al, https://doi.org/10.1101/531210 


Ethics oversight This study was performed under ethical approval from the Partners Healthcare Institutional Research Board (2013P001339/ 
MGH) and the Broad Institute Office of Research Subjects Protection (ORSP-3862) in compliance with all relevant ethical 
regulations; informed consent was obtained from all research participants. 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 
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® Check for updates 


Coronavirus disease 2019 (COVID-19) is an acute infection of the respiratory tract that 
emerged in late 2019”. Initial outbreaks in China involved 13.8% of cases with severe 
courses, and 6.1% of cases with critical courses*. This severe presentation may result 
from the virus using a virus receptor that is expressed predominantly in the lung”*; the 
same receptor tropism is thought to have determined the pathogenicity—but also 
aided in the control—of severe acute respiratory syndrome (SARS) in 2003°. However, 
there are reports of cases of COVID-19 in which the patient shows mild upper 
respiratory tract symptoms, which suggests the potential for pre- or oligosymptomatic 
transmission*® ©. There is an urgent need for information on virus replication, 
immunity and infectivity in specific sites of the body. Here we report a detailed 
virological analysis of nine cases of COVID-19 that provides proof of active virus 
replication in tissues of the upper respiratory tract. Pharyngeal virus shedding was 
very high during the first week of symptoms, witha peak at 7.11 x 10° RNA copies per 
throat swab on day 4. Infectious virus was readily isolated from samples derived from 
the throat or lung, but not from stool samples—in spite of high concentrations of virus 
RNA. Blood and urine samples never yielded virus. Active replication in the throat was 
confirmed by the presence of viral replicative RNA intermediates in the throat 
samples. We consistently detected sequence-distinct virus populations in throat and 


lung samples from one patient, proving independent replication. The shedding of 
viral RNA from sputum outlasted the end of symptoms. Seroconversion occurred 
after 7 days in 50% of patients (and by day 14 in all patients), but was not followed bya 
rapid decline in viral load. COVID-19 can present as a mild illness of the upper 
respiratory tract. The confirmation of active virus replication in the upper respiratory 
tract has implications for the containment of COVID-19. 


There is a close genetic relationship between SARS coronavirus 
(SARS-CoV) and the causative agent of COVID-19, SARS-CoV-2. The pre- 
dominant expression of ACE2 in the lower respiratory tract is believed 
to have determined the natural history of SARS as an infection of the 
lower respiratory tract®. Although the positive detection of SARS-CoV-2 
in clinical specimens from the upper respiratory tract has previously 
been described*”, these observations do not address the principal dif- 
ferences between SARS and COVID-19 in terms of clinical pathology. The 
patients who were studied here were enrolled because they acquired their 
infections upon known close contact to an index case, thereby avoiding 
representational biases owing to symptom-based case definitions. All 
patients were treated in a single hospital in Munich, Germany. Virological 
testing was done by two closely collaborating laboratories that used the 
same standards of technology for PCR with reverse transcription (RT- 
PCR) and virus isolation; these two laboratories confirmed each other’s 


results in almost all of the individual samples. Owing to the extremely 
high congruence of results, all data—except for the serological data 
(which are based on results from one laboratory only)—are presented 
together. The patients are part of a larger cluster of epidemiologically 
linked cases that occurred after 23 January 2020 in Munich, as discovered 
on 27 January (ref."). The present study uses samples taken during the 
clinical course in the hospital, as well as from initial diagnostic testing 
before admission. In cases in which this initial diagnostic testing was done 
by other laboratories, the original samples were retrieved and retested 
under the rigorous quality standards of the present study. 


RT-PCR, replication sites and infectivity 


To first understand whether the described clinical presentations are 
solely caused by infection with SARS-CoV-2, samples from all patients 
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Fig. 1| Hallmarks of viral shedding in aggregated samples. a, Samples and 


sample types per day. b, Viral RNA concentrations in samples from the upper 
respiratory tract. Neg., sample negative for RNA copies. c, Viral RNA 
concentrations in sputum and stool samples. d, Seroconversion and virus 
isolation success, dependent on day after the onset of symptoms. Top, fraction 
of seroconverted patients. Bottom, aggregated results of virus isolation trials. 
e, Virus isolation success, dependent on viral load. Viral loads were projected 
to RNA copies per ml (for sputum samples), per swab (for throat swab samples) 
or per g (for stool samples). f, g, Projected virus isolation success based on 
probit distributions. The inner lines are probit curves (dose-response rule). 
The outer dotted lines are 95% confidence interval. For a<5% isolation success, 
the estimated day was 9.78 (95% confidence interval 8.45-21.78) days after the 
onset of symptoms, and the estimated RNA concentration for <5% isolation 
success was estimated to be 5.40 log,)(RNA copies per ml) (95% confidence 
interval —4.11-6.51). h, Subgenomic viral RNA transcripts in relation to viral 
genomic RNA. Dots represent mean values of RT-PCR data obtained from at 
least two independent experiments on samples from individual patients. Plots 
show median values with interquartile ranges. 


were tested against a panel of typical agents of respiratory viral infec- 
tion, including human coronavirus (HCoV)-HKU1, HCoV-OC43, 
HCoV-NL63 and HCovV-229E, influenza virus A, influenza virus B, phi- 
novirus, enterovirus, respiratory syncytial virus, human parainfluenza 
viruses 1-4, human metapneumovirus, adenovirus and human bocavi- 
rus. No coinfection was detected in any patient. 

All patients were initially diagnosed by RT-PCR from oro- or naso- 
pharyngeal swab specimens”. Both types of specimen were collected 
over the whole clinical course in all patients. There were no discernible 
differences in viral loads or detection rates when comparing naso- and 
oropharyngeal swabs (Fig. 1b). The earliest swabs were taken on day1 
of symptoms, which were often very mild or prodromal. Allswabs from 
all patients taken between day land day S tested positive. The average 
virus RNA load was 6.76 x 10° copies per whole swab until day 5, and 
the maximum load was 7.11 x 108 copies per swab. Swab samples taken 
after day 5 had an average viral load of 3.44 x 10° copies per swab and 
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Table 1 | Single-nucleotide polymorphism at genome 
position 6446 in clinical samples from patient no. 4 


Day after onset of symptoms 


5 6 7 8 9 10 1 
Swab A A 
Sputum G G G G>A 


Stool 


a detection rate of 39.93%. The last swab sample that tested positive 
was taken on day 28 after the onset of symptoms. The average viral load 
in sputum was 7.00 x 10° copies per ml, with a maximum of 2.35 x 10° 
copies per ml. 

Because swab samples had limited sensitivity for the initial diagno- 
sis of cases of SARS“, we analysed the first paired swab and sputum 
samples taken onthe same occasion from seven patients. All samples 
were taken between 2 and 4 days after the onset of symptoms. In two 
cases, swab samples had virus concentrations that were clearly higher 
than those in sputum samples, as indicated by a difference of >3 inthe 
threshold cycle (C,) value. The opposite was true in two other cases, and 
the remaining three cases had similar concentrations in both sample 
types. 

None of 27 urine samples and none of 31 serum samples tested posi- 
tive for RNA from SARS-CoV2. 

To understand infectivity, we attempted live virus isolation on mul- 
tiple occasions from clinical samples (Fig. 1d). Whereas the virus was 
readily isolated during the first week of symptoms froma considerable 
fraction of samples (16.66% of swabs and 83.33% of sputum samples), 
no isolates were obtained from samples taken after day 8 in spite of 
ongoing high viral loads. 

Virus isolation from stool samples was never successful, irrespective 
of viral RNA concentration, on the basis of a total of 13 samples taken 
between day 6 and day 12 from 4 patients. The success of virus isola- 
tion also depended on viral load: samples that contained <10° copies 
per ml (or copies per sample) never yielded an isolate. For swab and 
sputum samples, interpolation based on a probit model was done to 
obtain laboratory-based infectivity criteria for the discharge of patients 
(Fig. le, f). 

High viral loads and successful isolation from early throat swabs 
suggested potential virus replication in tissues of the upper respira- 
tory tract. To obtain proof of active virus replication in the absence 
of histopathology, we conducted RT-PCR tests to identify viral sub- 
genomic mRNAs directly in clinical samples (Extended Data Fig. 1). 
Viral subgenomic mRNA is transcribed only in infected cells and is 
not packaged into virions, and therefore indicates the presence of 
actively infected cells in samples. Levels of viral subgenomic mRNA were 
compared against viral genomic RNA in the same sample. In sputum 
samples taken on day 4 to day 9, during which time active replication 
in sputum was obvious in all patients as per longitudinal viral load 
courses (as described in ‘Viral load, antibody response and clinical 
course’), the ratios of mean normalized subgenomic mRNA per genome 
were about 0.4% (Fig. 1g). A decline occurred from day 10 to day 11. In 
throat swabs, all samples taken up to day 5 were in the same range, 
whereas no subgenomic mRNA was detectable in swabs thereafter. 
Together, these data indicate the active replication of SARS-CoV-2 in 
the throat during the first five days after the onset of symptoms. No 
(or only minimal) indications of replication in stool were obtained by 
the same method (Fig. 1g). 

During our study, we sequenced full virus genomes from all patients. 
A G6446A exchange was first detected in one patient, and later trans- 
mitted to other patients in the cluster". In the first patient, this muta- 
tion was found in a throat swab while a sputum sample from the 
same day showed the original allele (G6446). The single-nucleotide 
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Fig. 2| Viral load kinetics, seroconversion and clinical observations in 
individual cases. a-i, The panels correspond to patients no. 1 (a), 2 (b), 3 (c), 4 
(d), 7 (e), 8 (Ff), 10 (g), 14 (h) and 16 (i) ina previous publication". Dotted lines, 


polymorphism was analysed by RT-PCR and Sanger sequencing in all 
sequential samples available from that patient (Table 1). The presence 
of separate genotypes in throat swabs and sputum strongly supported 
our suspicion of independent virus replication in the throat, rather 
than passive shedding to the throat from the lung. 


Viral load, antibody response and clinical course 


Daily measurements of viral load in sputum, pharyngeal swabs and 
stool are summarized in Fig. 2. In general, the concentrations of 
viral RNA were very high in initial samples. In all patients except 
one, the concentration of viral RNA in throat swabs seemed to be 
already on the decline at the time of first presentation. Viral RNA 
concentrations in sputum declined more slowly, with a peak during 
the first week of symptoms in three out of eight patients. Viral RNA 
concentrations in stools were also high. In many cases, the course 
of viral RNA concentration in stools seemed to reflect the course in 
sputum (Fig. 2a—c). In only one case did independent replication in 
the intestinal tract seem obvious from the course of stool RNA excre- 
tion (Fig. 2d). Whereas symptoms mostly waned until the end of the 
first week (Table 2), viral RNA remained detectable in throat swabs 
well into the second week. Stool and sputum samples remained 
RNA-positive over three weeks in six of the nine patients, in spite 
of full resolution of symptoms. 

All cases had comparatively mild courses (Table 2). The two patients 
who showed some signs of lung infection were the only cases in which 
sputum viral loads showed a late and high peak around day 10 or 11, 
whereas sputum viral loads were onthe decline by this time in all other 
patients (Fig. 2f, g). Of note, four out of nine patients showed a loss of 
taste and olfactory sensation, and described this loss to be stronger 
and more long-lasting than in common cold diseases. 


8 10 
onset of symptoms 


12 


9 
Days after onset of symptoms 


10 nb 


Cough, dyspnoea 


limit of quantification. Experiments were performed in duplicate and the data 
presented are the mean of results obtained by two laboratories independently. 
PRNT,9, Serum dilution that causes viral plaque reduction of 90%. 


Seroconversion was detected by IgG and IgM immunofluorescence 
using cells that express the spike protein of SARS-CoV-2 and a virus 
neutralization assay using SARS-CoV-2 (Table 3, Extended Data Fig. 2). 
Seroconversion in 50% of patients occurred by day 7, and in all patients 
by day 14 (Fig. 1d). No viruses were isolated after day 7. All patients 
showed detectable neutralizing antibodies, the titres of which did 
not suggest close correlation with clinical courses. Of note, patient 
no. 4, who showed the lowest virus neutralization titre at end of week 2, 
seemed to shed virus from stool over a prolonged time (Fig. 2d). Results 
from the differential recombinant immunofluorescence assay indicated 
cross-reactivity or cross-stimulation against the four endemic human 
coronaviruses in several patients (Extended Data Table 1). 


Conclusions 


The clinical courses in the patients under study—all of whom were 
young: to middle-aged professionals without notable underlying 
disease—were mild. Apart from one patient, all cases were first tested 
when symptoms were still mild or in the prodromal stage (a period in 
which most patients would present once there is general awareness 
of acirculating pandemic disease’). Diagnostic testing suggests that 
simple throat swabs will provide sufficient sensitivity at this stage of 
infection. This is in stark contrast to SARS; for instance, only 38 of 98 
nasal or nasopharyngeal swab samples tested positive by RT-PCR in 
patients with SARS in Hong Kong’. Viral load also differs considerably 
between SARS and COVID-19. For SARS, it took 7 to 10 days after the 
onset of symptoms until peak RNA concentrations (of up to 5 x 10° cop- 
ies per swab) were reached”. In the present study, peak concentrations 
were reached before day 5, and were more than 1,000 times higher. 
Successful isolation of live virus from throat swabs is another notable 
difference between COVID-19 and SARS, for which such isolation was 
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Table 2 | Clinical characteristics of all patients 


PatientIDno. Comorbidity Initial symptoms Later symptoms ANCperpl ALCperpl CRP(mgl") LDH(UL") 
1 Hypothyroidism Cough, fever, diarrhoea Diarrhoea 4,870 1,900 46 197 
2 None Sinusitis, cephalgia, cough +Hyposmia, ageusia 3,040 1,200 49 182 
} COPD Arthralgia, sinusitis, cough Dysosmia, dysgeusia 5,040 2,600 1.3 191 
4 None Otitis, rhinitis Hyposmia, hypogeusia 2,420 2,220 5.9 149 
7 Hypercholesterolaemia Rhinitis, cough Fever, dyspnoea, hyposmia, 4,690 900 4.9 209 
hypogeusia 
8 None Sinusitis, cough 2,500 1,600 17 203 
10 None Sinusitis, cough Fever, cough 2,350 700 78 220 
14 None Fever, cough, diarrhoea 5,040 1,500 9.8 220 
16 None None 4,620 900 0.5 201 
ALC, absolute lymphocyte count; ANC, absolute neutrophil count; CRC, C-reactive protein; COPD, chronic obstructive pulmonary disease; LDH, lactate dehydrogenase. 


rarely successful’*“*. This suggests active virus replication in tissues 
of the upper respiratory tract, where SARS-CoV is not thought to rep- 
licate in spite of detectable ACE2 expression’*”°. At the same time, the 
concurrent use of ACE2 as a receptor by SARS-CoV and SARS-CoV-2 
corresponds toa highly similar excretion kinetic in sputum, with active 
replication in the lung. SARS-CoV was previously found” in sputum at 
mean concentrations of 1.2-2.8 x 10° copies per ml, which corresponds 
to observations made here. 

Whereas proof of replication by histopathology is awaited, extended 
tissue tropism of SARS-CoV-2 with replication in the throat is strongly 
supported by our studies of cells that transcribe subgenomic mRNA in 
throat swab samples, particularly during the first 5 days of symptoms. 
Notable additional evidence for independent replication in the throat is 
provided by sequence findings in one patient, who consistently showed 
a distinct virus in the throat as opposed to the lung. In addition, the 
disturbance of gustatory and olfactory senses points at an infection 
of the tissues of the upper respiratory tract. 

Critically, the majority of patients in the present study seemed to be 
beyond their shedding peakin samples from the upper respiratory tract 
when they were first tested, whereas the shedding of infectious virusin 
sputum continued throughout the first week of symptoms. Together, 
these findings suggest a more efficient transmission of SARS-CoV-2 than 
SARS-CoV, through active pharyngeal viral shedding at atime at which 
symptoms are still mild and typical of infections of the upper respira- 
tory tract. Later in the disease, COVID-19 resembles SARS in terms of 
replication in the lower respiratory tract. Of note, the two patients 
who showed some symptoms of the lungs being affected showed a 


Table 3 | IgG and IgM immunofluorescence titres against 
SARS-CoV-2, from all patients 


PatientIDno. Initialserum Final serum 
Day IgG Day IgG IgM = PRNTgp -PRNTs5 
after after 
onset onset 
1 5 <10 21 1000 100 160 >640 
2 4 <10 19 ,000 oo 40 320 
3 3 <10 23 ,000 00 160 >640 
4 5 <10 «17 0,000 <10 20 160 
7 6 <10 20 0,000 100 »>1,280 >1,280 
8 6 10 20 0,000 10 80 >320 
10 6 <10 28 ,000 ie) 10 >40 
14 NA NA 12 0,000 100 >40 >40 
16 NA NA 13 ,000 oOo 680 >320 


NA, not applicable; PRNTs, serum dilution that causes viral plaque reduction of 50%. 
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prolonged viral load in sputum. Our study is limited, in that no severe 
cases were observed. Future studies that include severe cases should 
look at the prognostic value of an increase of viral load beyond the end 
of week 1, potentially indicating an aggravation of symptoms. 

One of the most interesting hypotheses to explain the potential exten- 
sion of tropism to the throat is the presence of a polybasic furin-type 
cleavage site at the S1-S2 junction in the SARS-CoV-2 spike protein that 
is not present in SARS-CoV”. The insertion of a polybasic cleavage site 
inthe S1-S2 region in SARS-CoV has previously been shown to lead toa 
moderate, but discernible, gain-of-fusion activity that might result in 
increased viral entry in tissues with a low density of ACE2 expression”. 

The combination of very high concentrations of virus RNA and the 
occasional detection of cells in stools that contain subgenomic MRNA 
indicate active replication in the gastrointestinal tract. Active repli- 
cation is also suggested by a much higher detection rate compared 
to the Middle East respiratory system coronavirus (MERS-CoV), for 
which stool-associated RNA was found in only 14.6% of samples from 
37 patients hospitalized in Riyadh (Saudi Arabia)”. If SARS-CoV-2 was 
only passively present in the stool (suchas after swallowing respiratory 
secretions), similar detection rates as for MERS-CoV would be expected. 
Replication in the gastrointestinal tract is also supported by analogy 
with SARS-CoV, which was regularly excreted in stool (from which it 
could be isolated in cell culture”). Our failure to isolate live SARS-CoV-2 
from stools may be due to the mild courses of cases, with only one 
case showing intermittent diarrhoea. In China, diarrhoea was seen in 
only 2 of 99 cases”. Further studies should therefore address whether 
SARS-CoV-2 shed in stools is rendered noninfectious though contact 
with the gut environment. Our initial results suggest that measures 
to contain viral spread should aim at droplet-, rather than fomite-, 
based transmission. 

The prolonged viral shedding in sputum is relevant not only for the 
control of infections in hospitals, but also for discharge management. 
Ina situation characterized by a limited capacity of hospital beds in 
infectious disease wards, there is pressure for early discharge after 
treatment. On the basis of the present findings, early discharge with 
ensuing home isolation could be chosen for patients who are beyond 
day 10 of symptoms and have less than 100,000 viral RNA copies per 
ml of sputum. Both criteria predict that there is little residual risk of 
infectivity, on the basis of cell culture. 

The serological courses of all patients suggest a timing of serocon- 
version similar to, or slightly earlier than, in SARS-CoV infection’. 
Seroconversion in most cases of SARS occurred during the second 
week of symptoms. As in SARS and MERS, IgM was not detected con- 
siderably earlier than IgG in immunofluorescence; this might in part 
be due to technical reasons, as the higher avidity of IgG antibodies 
outcompetes IgM for viral epitopes in the assay. IgG depletion can only 
partially alleviate this effect. Because immunofluorescence assay is a 


labour-intensive method, enzyme-linked immunosorbent assay tests 
should be developed as a screening test. Neutralization testing is nec- 
essary to rule out cross-reactive antibodies directed against endemic 
human coronaviruses. On the basis of the frequently low neutralizing 
antibody titres observed in coronavirus infection”°”’, we have here 
developed a particularly sensitive plaque-reduction neutralization 
assay. Considering the titres we observed, a simpler microneutrali- 
zation test format is likely to provide sufficient sensitivity in routine 
application and population studies. 

When aligned to viral load courses, it seems there is no abrupt virus 
elimination at the time of seroconversion. Rather, seroconversion 
early in week 2 coincides with a slow but steady decline of viral load 
in sputum. Whether properties such as the glycosylation pattern at 
critical sites of the glycoprotein have a role in the attenuation of the 
neutralizing antibody response needs further clarification. In any 
case, vaccine approaches targeting mainly the induction of antibody 
responses should aim to induce particularly strong antibody responses 
to be effective. 
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Methods 


No statistical methods were used to predetermine sample size. The 
experiments were not randomized and investigators were not blinded 
to allocation during experiments and outcome assessment. 


Clinical samples and viral load conversion 

Sputum and stool samples were taken and shipped in native conditions. 
Oro- and nasopharyngeal throat swabs were preserved in 3 ml of viral 
transport medium. Viral loads in sputum samples were projected to RNA 
copies per ml, in stool samples to copies per g and in throat swabs to 
copies per 3 ml, assuming that all sample components were suspended 
in3 ml viral transport medium. For swab samples suspended in less than 
3 ml viral transport medium, this conversion was adapted to represent 
copies per whole swab. An aggregated overview of samples received 
per day after the onset of disease from all patients is shown in Fig. la. 


RT-PCR for SARS-CoV-2 and other respiratory viruses 

RT-PCR used targets in the £and RdRp genes as previously described”. 
Both laboratories used a pre-formulated oligonucleotide mixture 
(Tib-Molbiol) to make the laboratory procedures more reproducible. 
All patients were also tested for other respiratory viruses, including 
HCoV-HKU1, HCoV-0C43, HCoV-NL63 and HCoV-229E, influenza virus 
A, influenza virus B, rhinovirus, enterovirus, respiratory syncytial virus, 
human parainfluenza viruses 1-4, human metapneumovirus, adenovirus 
and human bocavirus using LightMix-Modular Assays (Roche). Addi- 
tional technical details are provided in Supplementary Methods section1. 


Virus isolation 

Virus isolation was done in two laboratories on Vero E6 cells. In brief, 
100 ul of suspended, cleared and filtered clinical sample was mixed 
with an equal volume of cell culture medium. Supernatant was col- 
lected after O, 1,3 and 5 days and used in RT-PCR analysis. Additional 
technical details are provided in Supplementary Methods section 2a. 


Serology 

We performed recombinant immunofluorescence assays to determine 
the specific reactivity against recombinant spike proteins in VeroB4 
cells, as previously described”*”*. This assay used a cloned coronavirus 
spike protein from HCoV-229E, HCoV-NL63, HCoV-OC43, HCoV-HKU1 
or SARS-CoV-2. The screening dilution was 1:10. Plaque reduction 
neutralization tests were done essentially as previously described 
for MERS-CoV”*. Serum dilutions causing plaque reductions of 90% 
(PRNT,.) and 50% (PRNT;.) were recorded as titres. Additional technical 
details are provided in Supplementary Methods section 2b, c. 


Statistical analyses 
Statistical analyses were done using SPSS software (version 25) or Grap- 
Pad Prism (version 8). 


Ethical approval statement 

All patients provided informed consent for the use of their data and 
clinical samples for the purposes of the present study. Institutional 
review board clearance for the scientific use of patient data has been 
granted to the treating institution by the ethics committee at the 
Medical Faculty of the Ludwig Maximillians Universitat Munich 
(vote 20-225 KB). 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


Sequence data are available in Gisaid under accession number EPI_ 
ISL_406862. All other data are available from C.D. upon reasonable 
request. 
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Patient #4 1:10 1:100 1:1,000 1:10,000 


5 days post onset 


17 days post onset 
Extended Data Fig. 2 | Recombinant SARS-CoV-2-spike-based after the onset of symptoms. Secondary detection was done by usinga 
immunofluorescence test shows seroconversion of patient no. 4. goat-anti human immunoglobulin labelled with Alexa Fluor 488 (shown in 
Representative outcome of arecombinant immunofluorescence test using green). The experiment was performed in duplicate. 


serum dilutions 1:10, 1:100, 1:1,000 and 1:10,000 of patient no. 4 at Sand 17 days 


Extended Data Table 1| IgG immunofluorescence titres against endemic human coronaviruses 


Patient ID | Primary serum Final serum 
Day OC43 NL63 HKU1 229E Day OC43 NL63 HKU1 229E 
p.o. p.o. 
#1/| 5 1,000 1,000 1,000 #100 15 1,000 1,000 1,000 100 
#2| 4 1,000 1,000 100 100 13 10,000 100 1,000 10 
#3 | 3 10,000 100 1,000 1000 16 10,000 1,000 10,000 1,000 
#4); 5 1,000 100 100 100 17 10,000 10 1,000 100 
#7| 6 1,000 100 1,000 1000 13 10,000 1,000 10,000 10,000 
#8| 6 1,000 100 1,000 1000 10 10,000 1,000 10,000 100 
#10) 6 1,000 100 100 1000 11. 10,000 1,000 100 1,000 
#14 na na na na na 5 100 100 100 100 
#16| na na na na na 13 10,000 1,000 1,000 100 


p.o., post onset; na, not available. Increases of titre through the final serum are indicated by reciprocal titres in bold. 
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The gut of healthy human neonates is usually devoid of viruses at birth, but quickly 


becomes colonized, which—in some cases—leads to gastrointestinal disorders’ *. Here 
we show that the assembly of the viral community in neonates takes place in distinct 
steps. Fluorescent staining of virus-like particles purified from infant meconium or 
early stool samples shows few or no particles, but by one month of life particle 
numbers increase to 10” per gram, and these numbers seem to persist throughout 
life>’. We investigated the origin of these viral populations using shotgun 
metagenomic sequencing of virus-enriched preparations and whole microbial 
communities, followed by targeted microbiological analyses. Results indicate that, 
early after birth, pioneer bacteria colonize the infant gut and by one month prophages 
induced from these bacteria provide the predominant population of virus-like 
particles. By four months of life, identifiable viruses that replicate in human cells 
become more prominent. Multiple human viruses were more abundant in stool 
samples from babies who were exclusively fed on formula milk compared with those 
fed partially or fully on breast milk, paralleling reports that breast milk can be 
protective against viral infections® ’°. Bacteriophage populations also differed 
depending on whether or not the infant was breastfed. We show that the colonization 
of the infant gut is stepwise, first mainly by temperate bacteriophages induced from 
pioneer bacteria, and later by viruses that replicate in human cells; this second phase 
is modulated by breastfeeding. 


To investigate early-life viral colonization, we first analysed stool sam- 
ples from 20 healthy infants longitudinally (Supplementary Table 1). 
Samples included meconium and/or early stool samples collected 0-4 
days after birth (median of 17 h after birth, range 11-152 h; hereafter 
‘month 0’) and stool samples collected at 1 and 4 months of life. The 
cohort consisted of self-identified African-American mothers from an 
urban US setting and their infants. As an initial step, virus-like particles 
(VLPs) were purified from meconium or stool and stained with SYBR 
gold, which binds nucleic acids. VLPs were subsequently visualized by 
epifluorescence microscopy (Fig. 1a). VLPs were undetectable in most 
of the meconium samples; only 3 out of 20 samples had detectable 
counts of VLPs (Fig. 1b). By month 1, most samples were positive, and 
VLP counts averaged 1.6 x 10° per gram of stool; values at month 4 were 
similar to the 1-month samples. We also tested the VLP counts from 12 
2-5-year-old children, which had an average of 9.4 x 108 per gram of 
stool; these results were not distinguishable from month-1 and month-4 
samples (P= 0.48, Wilcoxon rank-sum test). This number is also close 
to that reported for adults® ’. We therefore conclude that the high VLP 
counts seen in 1-month-old infants typically persist into adulthood. 


To characterize bacterial content, DNA was purified from whole 
meconium or stool samples and analysed by qPCR to quantify the copy 
numbers of the bacterial 16S rRNA gene (Fig. 1c). Some published stud- 
ies have suggested that microbial colonization of the infant begins 
in utero", but recent studies indicate that colonization more likely 
begins with the rupture of membranes and delivery” “. Quantification 
showed low or undetectable levels of bacterial 16S rRNA genes for 14 
of the 20 meconium/early stool samples from month O, and relatively 
low levels for the other 6 (median, 3.3 x 10° per gram of stool). By con- 
trast, for months 1 and 4, most samples were positive for 16S rRNA 
gene sequences, with a median value of 3.1 x 10° per gram of stool). To 
analyse the early-life microbiome further, total DNA from each sample 
was subjected to metagenomic sequencing. For the month-0 samples, 
many were dominated by human DNA, which is characteristic of the 
neonatal gut before colonization (Extended Data Fig. 1a). Some of 
the month-0 samples also contained bacterial DNA, indicating early 
colonization and/or reagent contamination. Levels of human DNA 
decreased with time after delivery (P= 0.04, Spearman’s rank-order 
correlation p =—0.45; Extended Data Fig. 1b), consistent with bacterial 
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Fig. 1| Detection and characterization of VLPs in infant gut samples. 

a, Representative fields of fluorescently stained VLPs from infant stool 
sampled at months 0, land 4. Scale bars, 10 um. b, Quantification of VLP counts 
per gram faeces. The minimum level of quantification was 6.6 x 10° particles 
per gram. Per sample, 5-10 fields were quantified. c, Copy numbers of bacterial 
16S rRNA genes analysed using qPCR. The minimum level of quantification was 
2,000 copies per gram. d, VLP richness was assessed using VLP metagenomic 
sequence data. Sequence reads were assembled into contigs, and contigs with 


colonization. Bacterial DNA predominated in month-1 and month-4 
samples (Extended Data Fig. 1a). The bacterial richness and diversity 
at month O was lower compared to month-1 and month-4 samples. 
Early bacterial colonizers included Proteobacteria, Actinobacteria, 
Bacteroidetes and Firmicutes (Extended Data Fig. 1c—e), consistent 
with previous studies”. 

To investigate the origin of the early life virome, DNA and RNA were 
purified from preparations of VLPs from stool from each of the three 
time points and characterized by metagenomic sequencing. After 
filtering out human DNA, we assembled sequence reads into contigs 
and annotated open-reading frames. Previous literature indicates 
that many gut viruses are uncharacterized bacteriophages (hereaf- 
ter phages)'"*"”, which are challenging to identify in metagenomic 
sequence data because the proportion of phage genomes in databases 
is small compared to the number of global phage types. Viruses that 
infect human cells are more fully characterized and thus more readily 
recognizable. To address this challenge, we required that half of all 
reading frames within a contig were annotated as viral to assign that 
contig as a virus. Quantification of viral species richness showed low 
values at month O, but higher richness after 1 and 4 months (Fig. 1d). 
After taxonomic assignment of viral contigs, we found that, despite 
the difficulty of identifying phages, the majority of VLP classifica- 
tions were in fact phage families (Fig. le). Most were from DNA phages, 
consistent with the reported rarity of RNA phages”. For DNA phages, 
an average of 31% of reads could be assigned as viral at month 1 and 
38% at month 4. The nature of the remainder is unknown but some 
reads probably represent unstudied phages. Values were lower at 
month O (11%), which probably reflects a relative increase in back- 
ground due tothe low numbers of recovered particles (Extended Data 
Fig. 2a—h). 


108 


10 


viral characteristics (at least 50% of open-reading frames annotated as viral) 
were enumerated. Viral species were called present if at least 10 reads per 
million reads from one sample aligned to that contig. e, Taxonomic 
assignments of VLP sequences. Reads were associated with viral lineages based 
onthe annotation of viral contigs. In b-d, violin plots represent the 
distribution of the individual datasets; samples were compared using 
two-sided Wilcoxon signed-rank tests. 


To assess community interactions, we compared bacterial abun- 
dances from16S rRNA gene qPCR data, bacterial richness and bacterial 
diversity from sequence data with VLP counts, viral richness and viral 
diversity, and found strong positive correlations (Extended Data Fig. 3a, 
band Supplementary Table 2). 

For the minority of viruses detected that are known to replicate in 
human cells (Fig. 1e), at month O a single sample was positive for Her- 
pesviridae and another for Picornaviridae. By month 4, human-cell 
viruses were more prominent, including Adenoviridae, Anelloviridae, 
Caliciviridae and Picornaviridae. 

Either of two modes of phage production could generate the 
observed VLP populations”. Lytic phages only grow by infection, 
replication and lysis (Extended Data Fig. 4a). Previous reports, which 
focused on older infants and adults, have suggested that lytic growth 
and predator-prey interactions between phages and bacteria were 
prominent in early-life communities”. Temperate phages can also rep- 
licate using a second strategy, which involves the integration of the 
phage DNA into the host bacterial DNA, followed by quiescent growth 
as an integrated prophage (Extended Data Fig. 4a). Exposure to an 
inducing signal causes integrated prophages to excise and resume lytic 
growth. Induction canalso take place at low levels spontaneously”. The 
prophage state is commonly maintained by repressor proteins, which 
also serve to exclude infection by similar or identical (homoimmune) 
phage strains”. 

To test whether the early-life virome is composed of strictly lytic 
phages, we purified 24 bacterial strains from the infant gut, including 
three Escherichia coli, three Klebsiella and ten Enterococcus strains 
(Supplementary Table 3), and plated virome fractions from the cog- 
nate infant back on these bacteria. In no case did virome VLPs form 
plaques on lawns of these bacteria, thus providing no evidence that 
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Fig. 2 | Prophage induction as the dominant contributor to the early-life 
virome. a, Heat map quantifying VLP production from 24 strains isolated from 
the faeces of the infants studied. The bacterial genera are summarized onthe 
left; columns summarize the numbers of fluorescent particles produced per ml 
of stationary-phase culture (according to the scale at the bottom). Columns 
compare particle production with and without an inducer (mitomycin C) as well 
as growth under aerobic and anaerobic conditions. b, Draft genome 
(horizontal line) of Enterococcus faecalis from one of the infants studied, 
showing the alignment frequency of reads from VLP preparations. Reads were 
aligned to the bacterial genomes that were generated from VLPs from pure 
culture after mitomycin treatment (red), from VLPs from pure culture in the 


lytic replication occurs. Note that temperate phages are not expected 
to form plaques on host cells that already contain those phages as an 
integrated prophage due to repressor-mediated homoimmune exclu- 
sion”. Analysis of our DNA VLP contigs using PHACTS, a random 
forest-based approach to classifying phage lifestyles™, indicated that 
most of the genomes more closely resembled temperate phages than 
lytic phages (Extended Data Fig. 4b). 

We next investigated whether virome populations resulted from 
the induction of prophages by quantifying VLP production from the 
24 infant bacterial strains described above. Bacterial strains were ana- 
lysed for spontaneous VLP production during growth in liquid cul- 
ture, and for production after induction with the DNA-damaging agent 
mitomycin C, using the fluorescent staining assay for VLP particles. 
Experiments were carried out under both aerobic and anaerobic condi- 
tions. We detected spontaneous VLP production in 32% of strains. After 
induction with mitomycin C, 80% of strains produced VLPs (Fig. 2a). 
In total, 16 strains showed VLP production of at least 10’ particles per 
ml under at least one condition. We therefore conclude that bacteria 
in the infant gut are commonly capable of high-level VLP production 
following prophage induction. 

The hypothesis that prophage induction yields the bulk of VLPs 
in infant gut samples predicts that VLP sequences found in stool 
samples should be detectable as integrated prophages in bacterial 
genome sequences. We therefore sequenced genomes of the 24 infant 
bacterial strains described above and of VLPs produced from those 
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absence of any inducer (blue) and from VLPs isolated from the stool of the 
infant from which the bacterial strain was isolated (green). Peaks indicate the 
detection of integrated prophages. One putative bacteriophage genome is 
shown below, with gene types colour-coded as indicated.c, Asinb, but fora 
Klebsiella pneumoniae isolate. d, Correlation between the abundance of VLPs 
present in infant stool and the abundance of the bacteria harbouring those 
prophages inthe same stool sample (n =33 phage contigs from 16 bacterial 
isolates from month-1 and month-4 strains). The black line shows the linear 
regression line and the grey-shaded region shows the 95% confidence interval 
for the slope (two-sided Spearman’s rank-order correlation). 


strains in the presence or absence of mitomycin C (Supplementary 
Table 3). Prophage sequences could be readily detected in the bacte- 
rial genomes, and many of these were detectable both in sequences 
from the induced VLP samples and in the VLP samples from infant 
stool. Examples are shown in Fig. 2b, c, in which the VLP sequence 
reads are shown aligned to bacterial contigs, so that the spikes indi- 
cate VLP detection after mitomycin C induction, in the absence of 
induction and in purified stool VLPs. To test the infant specificity of 
each community, we mapped the stool VLP reads back to the viral 
contigs assembled from VLPs induced from the 24 bacterial strains. 
Although the steps of VLP nucleic acid amplification before sequenc- 
ing can distort abundances, we nevertheless found that the induced 
VLPs from each purified bacterial strain were more similar to stool 
VLP sequences from the infant from which the bacterial strain was 
isolated than to VLPs from unmatched infants (P< 0.0001; Extended 
Data Fig. 5a), consistent with the production of phages in the infant 
gut by prophage induction. 

In addition, there was a significant positive correlation between 
the proportion of each bacterium in the infant gut community and 
the proportion of prophages from that bacterial species in the gut 
virome of the associated infant (P= 0.0008; Spearman’s rank-order 
correlation p = 0.53; Fig. 2d and Extended Data Fig. 5b). The abundance 
of the bacterial strains in the gut communities ranged from 0.03% to 
99.1%, indicating that in at least some cases a large proportion of the 
gut community was analysed. 
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Fig. 3 | Breastfeeding and viral colonization of the infant gut. 

a, Quantification of the percentage of infants who were positive for viruses of 
human cells in metagenomic virome sequence data. Sample sizes and cohorts 
studied are indicated at the top. The two feeding types are colour-coded as 
indicated. The summation over all viral families is shown at the bottom (total). 
b, Comparison of human virus colonization based on feeding type using qPCR. 
Three technical replicates were compared for each sample. Ina, b, the numbers 
of infants fed with formula and fed with breast milk or mixed feeding are, 
respectively, 14 and 6 inthe discovery cohort, 46 and 79 in the validation cohort 
from US urban areas, and 30 and 70 in the validation cohort from Africa. 

c, Abundances of the bacterial genera of Bifidobacterium and Lactobacillus 


The abundant crAssphages, which infect Bacteroides and do not inte- 
grate during replication®*”®, were scarce in samples from month 1, but 
more common by month 4 and in samples from 2-5-year-old children 
(Extended Data Fig. 6). Evidently this group of lytic phages colonizes 
children predominantly later in life, potentially reflecting sequen- 
tial acquisition of Bacteroides strains and later crAssphages from the 
environment. 

Our findings support the idea that prophage induction from pio- 
neer bacteria is the main source of the observed virome community 
by month 1. This is supported by the findings that: (1) replication of 
lytic phages was undetectable; (2) many purified bacterial strains from 
infants produced VLPs at high levels; (3) sequences of genomes from 
these induced VLPs could be identified as integrated prophages in 
bacteria isolated from these infants; (4) stool VLP genome sequences 
could be identified as integrated prophages in the bacterial genomes; 
(5) VLP abundance in stool was proportional to the abundance of the 
host bacteria in the same sample; and (6) VLP contigs were annotated 
primarily as lysogenic phages and not lytic phages. 

We then compared features of the VLP data from infant stool samples 
to metadata on feeding history, mode of delivery, sex and other variables 
(Supplementary Table 1). We found a strong influence of breastfeed- 
ing, which was associated with the lower accumulation of viruses that 
replicate inhuman cells. Taking a conservative threshold for detection, 
requiring coverage of 33% of the viral genome, viruses that infect human 
cells were only found in those infants who were exclusively fed formula 
milk (Fig. 3a and Extended Data Fig. 7a). Statistically, this achieved a P 
value of only 0.11 (Fisher’s exact test) owing to the small sample size and 


Breast milk 


separated by feeding type. Data were obtained using total microbial shotgun 
sequencing. d, Percentage of infants positive for phages annotated as infecting 
Bifidobacterium or Lactobacillus, assessed using VLP metagenomic 
sequencing. Inc, d, 103 (formula, n=59; breast milk or mixed, n=44) samples 
from both discovery and validation cohorts were used for which whole-stool 
shotgun sequencing data were available. a, b, d, Samples were compared using 
two-sided Fisher’s exact tests. Data summarize population proportions and 
95% confidence intervals. c, Samples were compared using two-sided Wilcoxon 
rank-sum tests with FDR correction. Data are mean +s.e.m.a-d, ***P<0.001, 
**P<0.01,*P<0.05.e, Summary of the findings of this study. 


unbalanced distribution (Fig. 3a and Extended Data Fig. 7b). Delivery type 
(spontaneous vaginal delivery versus caesarean section) did achieve sta- 
tistical significance (P= 0.01, Fisher’s exact test; Extended Data Fig. 7f, g). 

To challenge these conclusions, we analysed a validation cohort of an 
additional 125 infants, focusing on stool samples taken at 3-4 months 
of life. These samples were obtained from mixed-race cohorts of urban 
US infants (Supplementary Table 1). In these samples, delivery mode 
did not show a significant influence (Extended Data Fig. 7h-j), but a pro- 
tective effect of breastfeeding was evident—30% of formula-fed babies 
were positive for viruses that infect human cells, whereas 9% of babies 
who were fed breast milk or breast milk together with formula were 
positive (P= 0.003, Fisher’s exact test; Fig. 3a). Repeating the analysis 
requiring 0.1% coverage to 60% coverage of viral genomes for scoring 
detection yielded similarly significant results (Extended Data Fig. 7c, 
d). Acomparison after normalizing for sequencing depth also yielded 
asignificant difference (P< 0.0001, Wilcoxon rank-sum test; Extended 
Data Fig. 7e). As acontrol, preparations of formula were subjected to 
VLP purification and sequence analysis, which yielded no detections of 
viruses that infect animal cells (data not shown), indicating that these 
viruses were unlikely to originate as contamination from the formula 
products themselves. 

To validate the metagenomic detections, VLP DNA and RNA samples 
were also tested by qPCR for their content of Adenovirus, Torque teno 
virus, Enterovirus, Astroviridae, Sapovirus and Norovirus sequences 
(Supplementary Table 4). The qPCR analysis also showed enrichment 
of viruses that infect human cells in the exclusively formula-fed cohort 
(P= 0.0002, Fisher’s exact test; Fig. 3b). 
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Both populations studied above were from urban cohorts in the 
United States. To investigate whether our results hold more broadly, we 
analysed samples from a cohort of African newborns from Botswana. 
Infants who were 4 months of age were sampled using rectal swabs, 
so only qPCR assays and not sequence-based assays were carried out. 
Allinfants were delivered vaginally. We again found an association 
between exclusive formula feeding and colonization of viruses that 
grow in human cells (P= 0.011, Fisher’s exact test; Fig. 3b). 

Feeding type and other variables were then tested for effects on 
phage populations. Phage genes were annotated on assembled contigs, 
and their abundances used to calculate Bray—Curtis distances between 
communities. Significant differences in the population structure of 
phages could be detected based on feeding mode (P= 0.001, permu- 
tational multivariate analysis of variance (PERMANOVA); Extended 
Data Fig. 8a, c), but not for another 12 metadata variables (Extended 
Data Fig. 8a—e). To investigate the taxa involved, the shotgun sequenc- 
ing analysis of whole stool from pooled discovery and validation 
cohorts was queried, which showed that whole-stool samples from 
breastfed infants contained a higher abundance of Bifidobacterium 
(false-discovery rate (FDR)-corrected P= 0.02; Fig. 3c) and Lactobacil- 
lus (FDR-corrected P= 0.03; Fig. 3c). Consistent with host abundances, 
VLP sequences that aligned to the temperate phages of Bifidobacterium 
and Lactobacillus were also enriched in breastfed infants (P=0.03 and 
P<0.0001, Fisher’s exact test; Fig. 3d), in part explaining the effects of 
feeding mode on phage populations. 

In summary, our data indicate that viral colonization in early life 
is stepwise, with the first phase characterized by the induction of 
prophages from pioneer bacteria, and a second phase involving col- 
onization with viruses that infect human cells, the latter of which is 
modulated by breastfeeding (Fig. 3e). Previous epidemiological studies 
have emphasized the protective effects of breastfeeding in reducing 
viral gastroenteritis and infant death®”°. Mixed feeding of formulaand 
breast milk is also reported to be protective compared with feeding 
of formula only’, as was seen in the metagenomic analysis reported 
here. Factors in breast milk that are known to inhibit viral colonization 
include maternal antibodies, human milk oligosaccharides, lactoferrin 
and additional breast milk proteins”’”’. The work reported here further 
develops our understanding of the protection conferred by breast- 
feeding in several respects. The metagenomic data (1) document the 
extent of subclinical infections with potentially pathogenic viruses; (2) 
highlight the potency of viral inhibition by breastfeeding; and (3) reveal 
the diversity of viruses affected, including viral families that cannot be 
grown in the laboratory and for which inhibition by breastfeeding is 
unstudied. In the African cohort, we not only found viruses that grow 
inhuman cells more commonly in exclusively formula-fed babies, but 
we also found more colonization in both feeding groups compared to 
US babies, emphasizing potential opportunities to intervene to reduce 
viral transmission to infants. The metagenomic methods described 
here will be useful for the assessment of the effects of different feed- 
ing strategies in diverse global settings to optimize the protection of 
infants from viral infections in the gut. 
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Methods 


Data reporting 

No statistical methods were used to predetermine sample size. The 
experiments were not randomized and the investigators were not 
blinded to allocation during experiments and outcome assessment. 


Experimental model and human participants 

Three cohorts of newborn infants were studied. Detailed information 
is provided in Supplementary Table 1. All experimentation complied 
with ethical regulations, and written informed consent was obtained 
from all human participants or their parents. 

The Infant Growth and Microbiome Study (IGram) was approved by 
the Committee for the Protection of Human Subjects at The Children’s 
Hospital of Philadelphia (IRB14-010833). African-American women 
planning to deliver at the Hospital of the University of Pennsylvania 
and their infants were enrolled. Inclusion and exclusion criteria are 
listed in Supplementary Table 5. Study visits were conducted at The 
Children’s Hospital of Philadelphia. A total of 20 healthy, term infants 
were recruited for the discovery cohort. Stool samples were collected 
longitudinally at 0-4 days after birth (meconium samples, month 0), 
month 1 and month 4. The participants in an independent validation 
cohort had the same inclusion and exclusion criteria as the discovery 
cohort (only at month 4, n= 86). Fresh stool specimens from healthy 
infants were collected from diapers and aliquoted into faeces collec- 
tion tubes (Sarstedt). All samples were stored at —80 °C. Metadata 
regarding delivery mode, infant feeding and health outcomes were 
collected by medical chart review and in-person interview by trained 
research personnel. 

The Microbiome, Antibiotic and Growth Infant Cohort (MAGIC) 
Study was approved by the Committee for the Protection of Human 
Subjects at Children’s Hospital of Philadelphia (IRB 15-012623). The 
study enrolled children born at Pennsylvania Hospital, receiving pre- 
ventive healthcare in the CHOP Primary Care Network or participating 
in private practices, together with their biological mothers. The distri- 
bution of race, ethnicity and sex of the newborns reflected the general 
distribution in the participating sites. Allinfants enrolled were less than 
120 hofage, greater than 36 weeks gestation, heavier than 2,000 g and 
spent fewer than 120 hin the neonatal care unit. Mothers were over the 
age of 18 and spoke English. A total of 39 heathy, term babies were used 
for this cohort. Study visits were conducted at Children’s Hospital of 
Philadelphia. Stool samples were collected and questionnaires admin- 
istered at birth and every 3 months until the infant reached 24 months 
of age. Stool samples obtained at 3 months of life were used for this 
cohort. Fresh stool specimens were collected at home by parents using 
asterile faecal collection tube to scoop a pea-sized amount froma used 
diaper. Samples were then transported by courier on dry ice, aliquoted 
and stored at —80 °C. Maternal and baby clinical and metadata were 
collected by medical chart review and parent questionnaires. 

The Botswana Infant Microbiome Study was approved by the Bot- 
swana Ministry of Health (IRB HPDME 13/8/1) and Institutional Review 
Boards at the University of Pennsylvania (IRB 822692) and Duke Univer- 
sity (IRB 319561). Mother-infant pairs (n = 300) were enrolled within 
48 hof delivery at the Princess Marina Hospital and two public clinics 
in or near Gaborone, Botswana. Exclusion criteria included maternal 
age less than 18 years, infant birth weight lower than 2,000 g, mul- 
tiple gestation pregnancy and caesarian delivery. Participants were 
seen for monthly study visits until the infant was 6 months of age and 
every other month thereafter until the infant was 12 months of age. At 
all visits, a questionnaire was administered and clinical samples were 
obtained from the infant and the mother. Rectal swab samples obtained 
at 4 months of age from 100 infants were used for this cohort. These 
samples were collected into eNAT medium (Copan Italia), stored on 
ice and transported within 4 h to the National Health Laboratory in 
Gaborone for processing and storage at —80 °C. Metadata, including 


data regarding infant-feeding practices, were collected by medical 
chart review and in-person interview by trained research personnel. 

In no cases were infants from any cohort reported to be suffering 
from gastroenteritis at the time of sampling. 


Purification of VLPs from stool samples 
VLPs were purified as previously described*°. Approximately 200 mg 
of stool was homogenized in 30 ml of SM buffer (SO mM Tris-HCl 
pH 7.5, 100 mM NaCl, 8 mM MgSO,), spun down and filtered through 
a 0.2-"um-pore-size filter (Thermo Fisher Scientific). The filtrate was 
concentrated using a 100-kDa-molecular-mass Amicon Ultra-15 Cen- 
trifugal Filter (Millipore), resuspended in 30 ml SM buffer and con- 
centrated for the second time to a final volume of around 500 ul. The 
concentrate was treated with DNase I and RNase (Roche) at 37 °C for 
30 min to degrade nonencapsulated nucleic acids. A total of 200 pl 
VLP preparation was used for viral nucleic acid extraction immediately 
after DNase I and RNase treatment; the remainder was stored at 4 °C 
for up to 3 months. To detect enveloped viruses, no chloroform was 
used to treat the VLPs. To test the purification efficiency of VLPs, 16S 
qPCR was used to quantify the 16S copy number before (total microbial 
DNA) and after purification of VLPs (VLP viral DNA). Samples showed 
an average reduction of 99.9% after purification (Extended Data Fig. 9). 
Control spiking experiments with bacteriophage lambda showed 
that after addition to stool, approximately 90% of plaque-forming 
units could be recovered using the above methods. 


VLP enumeration 

Purified VLPs (35 pl) were diluted in 10 ml SM buffer and filtered ontoa 
0.02-um Anodisc polycarbonate filter (Whatman). Filters were stained 
with 2x SYBR Gold (Thermo Fisher Scientific) for 15 min, then washed 
with H,O once. After drying, the filter was mounted ona glass slide with 
15 pl of mountant buffer (100 p11 10% ascorbic acid, 4.9 ml pH 7.4 PBS, 
5ml100% glycerol; filtered through a 0.02-ym filter). For each filter, 
viruses were counted in 5-10 randomly selected fields of view. The 
filter was visualized using a motorized inverted system microscope 
IX81 (Shinjuku) for fluorescence. VLPs were counted using Image). 
Stained particles of less than 0.5 pm in diameter were regarded as VLPs 
(larger particles were not counted). Purified lambda phages with known 
plaque-forming unit counts per ml were used as a positive control to 
adjust image colour, saturation, level and contrast. VLPs mock-purified 
from SM buffer were used as negative controls. At least one count per 
microscope field was set as a threshold for a positive detection, which 
was equal to around 6.6 x 10° counts per gram faeces. Lower than this 
threshold, the VLP counts were considered to be below the limit of 
detection. The results are listed in Supplementary Table 2. 


Extraction and amplification of viral nucleic acids 

Viral DNA and RNA were extracted from VLPs using the AllPrep DNA/ 
RNA Mini kit (Qiagen) according to the manufacturer’s instructions. 
DNA was stored at —20 °C and RNA at -80 °C. Viral DNA was subjected 
to DNA whole-genome amplification using the GenomiPhi V2 Ampli- 
fication kit (GE Healthcare). Viral RNA was treated with DNase, reverse 
transcribed and PCR-amplified as previously described”. Specifically, 
20 pl of RNA was treated with 10 units of RNase-free, recombinant 
DNase (Roche) for 20 min at 37 °C. A total of 5 pl of each sample was 
then reverse transcribed. First-strand cDNA synthesis was completed 
using the SuperScript III First Strand Synthesis kit (Thermo Fisher Sci- 
entific) and primer A (5’-GTTTCCCAGTCACGATCNNNNNNNNN-3’), to 
allow for random priming”. ‘N’ indicates a mixture of all four bases. 
Second-strand synthesis was performed using DNA Polymerase I, Large 
(Klenow) Fragment (New England Biolabs). The dsDNA product was 
then amplified by adding primer B (5’°-GTTTCCCAGTCACGATC-3’) with 
AccuPrime Taq High Fidelity DNA polymerase (Thermo Fisher Scien- 
tific) with the following reaction conditions: 75.5 pl of molecular-grade 
H,O, 10 pl of 10x PCR buffer I, 4 pl of 50 mM MgCL,, 2.5 p110 mM dNTPs, 
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11100 uM primer B, 1 p! Taq and 6 pl dsDNA product. Products were 
amplified at 94 °C for 2 min, 94 °C for 30s, 40 °C for 30s, 50 °C for 30s, 
72 °C for 1 min for 40 cycles. Amplified DNA and cDNA were stored at 
-20 °C. 

For the African cohort, samples were only available as rectal swabs 
stored in eNAT medium (Copan Italia), which contains guanidine thyo- 
cianate. It was therefore not possible to purify VLPs before analysis. 
On the basis of experience, such samples have such high human DNA 
content that shotgun metagenomic analysis yields overwhelmingly 
human sequences—we therefore carried out only qPCR analysis of these 
samples. Nucleic acids were purified using the AllPrep DNA/RNA Mini 
kit (Qiagen) as described above. No pre-amplification was performed 
for either DNA or RNA. 


Total microbial DNA extraction 

Approximately 200 mg of stool was used for total microbial DNA extrac- 
tion. Total microbial DNA was purified from each sample using the Mo 
Bio PowerSoil kit (Mo Bio) following the manufacturer’s instructions. 
A total of 50 pl total microbial DNA was obtained for each sample and 
stored at -20 °C. 


Stool virome library and total microbial shotgun library 
construction and sequencing 

Amplified viral DNA, cDNA and total microbial DNA were used for the 
construction of the shotgun libraries. The DNA concentration was 
measured using the Quant-iT PicoGreen dsDNA Assay kit (Thermo 
Fisher Scientific) and the fluorescence was detected by an EnVision 
Multilabel Plate Reader (Waltham). Libraries were made using an Illu- 
mina Nextera XT Samples Prep kit (Illumina), quantified using both the 
Quant-iT PicoGreen dsDNA Assay kit and the KAPA Library Quantifica- 
tion kit (Kapa Biosystems). The size distribution of the libraries was 
checked by 5300 Fragment Analyzer (Agilent). Libraries were pooled 
for sequencing. The concentration of the pooled libraries was meas- 
ured using Qubit (Invitrogen) and the size distribution of the pooled 
libraries was checked by Agilent Technology 2100 Bioanalyzer using a 
High Sensitivity DNA chip (Agilent). Sequence was acquired using the 
Illumina Miseq (250-bp paired-end reads, Illumina) and HiSeq (150-bp 
paired-end reads, Illumina). 


Isolation of bacterial strains 

A total of 24 bacterial strains was isolated from the stool samples 
(19 samples from 12 infants) using three types of medium: Lysog- 
eny broth medium in aerobic conditions, Bifidus selective medium 
(Sigma-Aldrich) and eosin methylene blue medium (Sigma-Aldrich) 
in anaerobic conditions, which were incubated at 37 °C for up to 72 h. 
Single colonies were picked and re-streaked in medium plates at 
least three times to isolate pure bacterial strains. The bacterial tax- 
onomy was determined by matrix-assisted laser desorption ionization 
time-of-flight mass spectrometry (MALDI-TOF MS) using a MALDI-TOF 
BD instrument (BD) and default software. The taxonomy was further 
validated by mapping scaffolds to a16S rRNA gene database. 


Whole-genome sequencing of isolated bacterial strains 

All 24 bacterial strains were cultured in Lysogeny broth overnight 
or until an optical density at 600 nm (OD¢o9) > 1. DNA was extracted 
using the phenol-chloroform method. DNA quality control was as 
described above. The TruSeq DNA PCR-Free kit (Illumina) was used to 
make genomic DNA-sequencing libraries; quality control and pooling 
was as described above. Sequencing was performed using the Illumina 
Miseq (250-bp paired-end reads, Illumina). 


In vitro prophage induction and induced VLP sequencing 

Overnight cultures of isolated bacterial strains were diluted 1/100 into 
10 ml medium and grown until log phase (OD,,, = 0.6). Mitomycin C 
(Sigma-Aldrich) was then added to a final concentration of 5 ug mI. The 


OD,o9 Values were measured and VLPs were purified after 6 h of culture. 
VLPs were purified from the bacterial culture using the same method 
as described for stool VLP purification without the homogenization 
step. The purified VLP DNA was extracted and amplified for virome 
sequencing as described above, and also enumerated by SYBR Gold 
staining. RNA phages are generally not thought to form prophages; 
RNA was interrogated for three samples, and no phages were identified. 


In vitro phage infection 

Overnight cultures of isolated bacterial strains were diluted 1/100 into 
10 mI medium, grown to log phase (OD,,9= 0.6), and then 100 pl bacteria 
was mixed with 100 ul of serial dilutions of isolated infant stool VLPs 
with the addition of MgSO, (with a final concentration of 10 nM). The 
mixture was incubated at 37 °C for 30 min, diluted in 3 ml of warm soft 
agar, and then plated ona pre-warmed Lysogeny broth plate. Lambda 
phage was used as a positive control. 


16S rRNA gene qPCR 

Bacterial abundance was quantified using qPCR of the V1-V2 region of 
the 16S rRNA gene using a TaqMan-based assay (Applied Biosystems). 
Primer, probe sequences and the PCR program were described previ- 
ously” and are included in Supplementary Table 6. The reaction was 
conducted ona 7500 Fast Real Time qPCR system (Thermo Fisher Sci- 
entific). Triplicate reactions were performed. Results show the mean 
values (Supplementary Table 2). The limit of detection inthe 16S qPCR 
assay was determined to be 20 copies per reaction, which was equal to 
around 2,000 copies per gram faeces. 


Stool VLP sequence read quality control and taxonomic 
classification 

Quality control for the stool VLP reads was performed using the Sun- 
beam pipeline® with a custom Sunbeam extension (https://github. 
com/guanxiangliang/sbx_dedup). In brief, low-quality reads and 
adaptor sequences were removed by Trimmomatic™, low-complexity 
reads were identified and discarded by Komplexity (https://github. 
com/eclarke/komplexity) and then duplicate identical sequences 
(inferred PCR replicates) were filtered out by BBmap (https://jgi.doe. 
gov/data-and-tools/bbtools/). Dereplicated reads were aligned using 
BWA to the host (GRCh38 for human genome and GRCm38 for mouse 
genome) or phix174 and removed. The quality-controlled reads were 
classified by Kraken using a custom database that included all com- 
plete human, bacterial, archeal and viral genomes in RefSeq release 89 
(released on 9 July 2018), with low-complexity regions masked before 
building the database. 

Toinvestigate environmental contamination or experimental reagent 
contamination, negative control samples were analysed, including 
empty diaper samples, empty stool container samples and reagent-only 
samples. The Decontam package in R was used on the Kraken classifica- 
tion data to remove contaminating species with ‘prevalence’ method 
at athreshold of 0.5. Taxa including Klebsiella phage 0507-KN2-1, Choris- 
toneura occidentalis granulovirus, Vibrio phage pYD38-A, Pseudomonas 
phage PPpW-4, Burkholderia phage ST79, Burkholderia phage KS9, 
Bacillus virus phi29, Simbu orthobunyavirus, and Shamonda orthobu- 
nyavirus were removed from downstream analysis. 


Stool VLP sequence read assembly and annotation, and phage 
lifestyle prediction 

The quality-controlled reads were assembled into contigs using 
MEGAHIT~ for each individual. To quantify contigs in each sample, 
quality-controlled reads were mapped back to the contigs using Bow- 
tie2*”, and the number of mapped reads was calculated by processing 
SAM files using custom code. To remove differences in sequencing 
depth, reads per million total reads (RPM) were calculated for each 
contig. Assembled contigs from virome libraries with length larger 
than 3,000 bp were selected to predict open-reading frames (ORFs) 


using Prodigal in ‘meta’ mode*®. The predicted ORFs were mapped 
to the viral protein database in UniProt Knowledgebase (TrEML and 
Swiss-Prot)*’ using BLASTP with E<10>. 

To exclude contigs resulting from contamination, we mapped nega- 
tive control sample reads to the built VLP contigs. If the maximal RPM 
of negative control samples for the sample contig was greater thanin 
the stool samples, then that contig was marked as contamination and 
not used for downstream analysis. 

We defined an assembled contig as a viral contig if it had (1) at least 
one viral protein per 10 kb of VLP contig and (2) 50% of the predicted 
ORFs were viral ORFs. The taxonomy of each contig was classified as 
described previously*° modified to compile attributions over multi- 
ple reading frames to generate a single taxonomic assignment. The 
ORFs were assigned to taxa based on the best-hit viral protein in the 
UniProt Knowledgebase. The majority taxonomic assignment over 
all ORFs within a contig was given to the contig. Contigs that could 
not be assigned to any taxa were classified as ‘Others’. Contigs that 
were not assigned as ‘Bacteriophage’ were mapped to the NCBI nt 
database with a threshold of 80% coverage and 80% identity to futher 
remove contigs from non-viral genomes. In total, we identified 2,552 
viral contigs among all 20 infants (Extended Data Fig. 2a, b). Con- 
tigs that shared the same taxonomic assignments were collapsed to 
yield pooled RPM values for each taxon. Viral richness was calculated 
by observed species number with RPM > 10. DNA virome reads that 
could be assigned to our set of viral contigs accounted for 11.3% + 4.7% 
(mean +s.e.m.) at month O, 31.2% + 5.6% at month Land 37.7% 5.4% at 
month 4 of all nonhuman reads (Extended Data Fig. 2c—e). Other reads 
come from contamination, genomes of other microorganisms and 
unassigned categories. We think that some of the unassiged reads rep- 
resent unstudied bacteriophages, for which insufficient ORFs matched 
the viral ORFs in the database to label the contig as viral. For the RNA 
virome data, assembly from 12 out of 20 infants yielded contigs larger 
than 3,000 bp. The RNA virome reads that could be assigned to viral 
contigs accounted for a mean of 4.5% + 4.7% (mean +s.e.m.) at month 
0, 15.9% + 13.0% at month Land 10.0% + 5.1% at month 4 ofall nonhuman 
reads (Extended Data Fig. 2f-h). 

Viral contigs were scored as temperate or lytic bacteriophages using 
PHACTS”. In order to obtain strong predictions, only viral contigs with 
at least 10 ORFs were analysed. Of 2552 viral contigs, 1029 were classified 
as “Bacteriophages” and contained more than 10 ORFs and used for the 
PHACTS analysis. Ten replicate PHACTS predictions were performed. 
Probability values obtained from PHACTS were standardized between 
-Land 1, which was presented as probability of “Lytic” or “Temperate” 
(Extended Data Fig. 4b). 

To test the abundance of crAssphages in the infant gut, we mapped 
the stool VLP reads to 37 genomes which belong to the crAssphage 
family?>7°*!7, At least 33% genome coverage was considered to bea 
positive detection (Extended Data Fig. 6). Inthis analysis, we included 
stool VLP sequencing data froma group of older healthy children (2-5 
years old, n= 21; Supplementary Table 1). 


Profiling human-cell viruses 

Seven viral families that replicate on human cells were detected by 
Kraken. To further investigate the accumulation of these viruses, the 
viral genomes in RefSeq and Viral Neighbour databases that repre- 
sent these families were retrieved from NCBI. The stool VLP sequences 
were mapped to these genomes to estimate genome coverage using 
Bowtie2 with global alignment option”. The output sam files were 
process by Samtools’, Bedtools* and custom code (https://github. 
com/guanxiangliang/liang2019) to quantify the fraction of the genome 
covered. We favour use of percent coverage as a metric for genome 
detection*; amplification during sequence library preparation can 
yield many copies of single genome regions, yielding many sequence 
reads but with low genome coverage. Comparisons in several studies 
thus indicate coverage is a more reliable measure. We found that the 


negative control samples could contain coverage of up to -10% of a 
viral genome (Extended Data Fig. 7k). 


Human-cell virus qPCR 
The numbers of selected human-cell viral genome copies in stool sam- 
ples were determined by qPCR using TaqMan-based assays (Applied 
Biosystems). Primers and probes that target Adenoviruses**, Human 
Torque teno viruses”, Enteroviruses*, Astroviruses”’, Sappovirus Gl 
strains*° and Norovirus GIl strains* were used in this study. All primer 
and probe sequences are listed in Supplementary Table 6. The qPCR 
reactions were conducted ona 7500 Fast Real Time qPCR system using 
TaqMan Fast Advanced Master Mix (Thermo Fisher Scientific) ina final 
volume of 20 pI with 900 nM primers and 250 nM probe. All qPCR reac- 
tions were performed without preamplification for both VLP DNA or 
RNA. Triplicate reactions were performed and the results showing the 
mean values and standed deviations are listed in Supplementary Table 7. 
The availability of metagenomic virome sequence data and qPCR 
data allowed assessment of qPCR efficiency given sporadic mismatches 
of viral sequences to qPCR primers. A comparision between virome 
sequencing and qPCR data are presented in Supplementary Table 4. 


Total microbial shotgun metagenome sequencing read quality 
control and taxonomic classification 

The quality control for the shotgun metagenome sequencing 
reads were performed using the default pipeline in Sunbeam. The 
quality-controlled reads were classified by Kraken using the same 
database as was used for stool VLP sequence analysis. To calculate 
the bacterial richness and diversity, 15,000 paired reads were ran- 
domly selected from each sample, and MetaPhlAn2 was used to align 
reads to different levels of bacterial ta~onomy~. Bacterial richness 
was calculated as observed species number, and Shannon diversity 
was calculated using the Vegan package in R. The Decontam package 
in R was used to remove contaminating sequences with “prevalence” 
method at a threshold of 0.5*. 


Bacterial whole-genome sequence assembly and quality control 
The quality control for the bacterial whole-genome sequence reads was 
performed using Sunbeam without removing low-complexity reads. 
The quality-controlled reads were assembled by SPAdes™, followed by 
SSPACE to make scaffolds*™. The quality of scaffolds (completeness and 
contamination) was evaluated by CheckM®. The assembled scaffolds 
revealed good quality for each bacterial strain (Supplementary Table 3). 


Integrated analysis of stool VLP, induced VLP, shotgun 
metagenome and whole-genome sequence data 
To analyse whether stool viruses from the stool VLP sequences matched 
sequences of induced VLPs from purified infant gut bacterial strains, 
reads from induced VLPs and stool VLPs were mapped to the corre- 
sponding bacterial scaffolds using Bowtie2””. The number of mapped 
reads and bedgraph files for coverage plots were generated using Sam- 
tools**, Deeptools*tand custom code. Induced VLP sequences from 
isolated bacterial strains were mapped to the stool virome contigs using 
the same method to assess whether the induced phages from isolated 
bacteria were more similar to stool VLP sequences from infants from 
whom the bacterial strain originated than in VLPs from unmatched 
individuals. The prophage genome annotation was performed by 
PHASTER®* targeting bacterial genomic scaffolds longer than 100,000. 
VLP contigs for analysis were identified as follows. First, we asked 
whether assembled contigs from the induced VLPs of the 24 purified 
bacterial strains could be identified in the 24 sequenced bacterial 
genomes. We required that more than 50% of the induced VLP contig 
length was matched toa bacterial genome scaffold or vice versa (Blastn 
with E<107°). Second, we asked whether induced VLP contigs recog- 
nized as candidate prophages encoded proteins that were present 
inthe UniProt viral protein database. At least 50% of the ORFs were 
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required to be virus-like proteins (Blastp with £ < 10°). Third, for the 
induced VLPs, contigs were required to comprise at least 5% of all reads 
for inclusion in the analysis. 

To evaluate whether the induced prophages from purified bacteria 
were more similar to stool VLP sequences from infants from which 
the bacterial strain originated than VLPs from unmatched infants, we 
mapped the stool VLP reads to the corresponding induced VLP contigs 
from the same infant (within infants) as well as unmatched infants 
(between infants) using Bowtie2 (Extended Data Fig. 5a). 

Several further analyses were performed to investigate the correla- 
tion between the proportion of each bacterial species in the infant 
gut community and the proportion of prophages from that bacterial 
species in the infant gut virome. 

Stool VLP sequences were mapped to the induced VLP contigs identi- 
fied above using Bowtie2. The proportion of mapped reads from stool 
VLPs were divided by the total number of stool VLP reads to obtain 
the proportion, which represents the abundance of each bacterial 
prophage in the infant gut virome. The proportion of isolated bacteria 
in the infant gut community was represented by the proportion of 
shotgun reads that could be mapped to the isolated bacterial genome 
divided by all nonhuman reads. The abundance of bacterial prophages 
was plotted against the isolated bacterial abundance (Fig. 2d). This 
analysis was conducted using data based on both mitomycin C induc- 
tion (Fig. 2d) and spontaneous induction (Extended Data Fig. 5b). 


Phage population structure analysis 

To analyse the phage populations, 185 samples pooled from both dis- 
covery and validation US cohorts were analysed. Assembled contigs 
from individual DNA virus libraries with length larger than 3,000 bp 
were selected to predict ORFs as described above. Accurately assign- 
ing taxonomic ranking of viral contig is still a challenge, therefore, we 
preformed ataxonomy-independent population analysis. ORFs were 
mapped to the Pfam database using HmmScan (HMMER 3.1; http:// 
hmmer.org/) with £ < 10°. Pfam entries that belong to phages, and 
those that were shared by phages and bacteria, were selected for further 
analysis. The coordinates of each Pfam entry on the contigs were identi- 
fied by custom code, and VLP reads were aligned to these coordinates 
by featureCounts” to evaluate the abundance of each Pfam entry. The 
Pfam annotations for each sample were catalogued and a matrix was 
generated for annotation over all samples. Clustering was evaluated 
using Bray—Curtis dissimilarities. Bray—Curtis dissimilarities were plot- 
ted using principal coordinate analysis (PCoA), and differences among 
groups (infant age, infant feeding type, infant delivery type, infant 
gender, mother body type, formula type, mother pregnancy induce 
hypertension or diabetes and mother Chorioamnionitis) were tested 
using PERMANOVA. Continuous variables (gestational age, infant birth 
weight, household underage number, household number and mother 
pregnancy weight gain) were fit to the PCoA ordination by regression 
using the Envfit function. Pvalues were determined using 999 permuta- 
tions. The analysis was carried out using the vegan R package. 


Bifidobacterium and Lactobacillus phage analysis 

We downloaded 42 Lactobacillus phage genomes from RefSeq and 
used these for comparison. RefSeq did not contain any Bifidobacterium 
phage genome sequences, but two Bifidobacterium phage genomes 
were available in NCBI (accession numbers GQ141189.1 and MH444512.1) 
and were used for analysis here. Genome coverage was estimated using 
the same method as was used for animal virus coverage analysis. The 
Bifidobacterium and Lactobacillus phages that were highly covered 
(>33%) by sequencing contain annotated ‘Integrase’ proteins, suggest- 
ing temperate replication cycles. 


Quantification and statistical analysis 
Statistical tests were conducted using R. Nonparametric tests were 
used to compare two independent groups (Wilcoxon rank-sum test), 


two related groups (Wilcoxon signed-rank test) and multiple groups 
(Kruskal-Wallis test with Bonferroni correction). Nonparametric cor- 
relation was performed using Spearman’s rank-order correlation (R rep- 
resents Spearman’s p). Fisher’s exact test was used to test the difference 
between two categorical variables. Pvalues for multiple comparisons 
were corrected using the Benjamini- Hochberg FDR method. P< 0.05 
or FDR-corrected P< 0.05 was considered significant. All reported P 
values are from two-sided comparisons. All acquired data were included 
in analyses. 


Gnotobiotic mouse control 

As acontrol for this study, we prepared and analysed VLPs from stool 
samples from gnotobiotic mice and found viral sequences that were not 
present in contamination controls. In this case, the particles found were 
derived from murine endogeneous retroviruses, specifically murine 
leukaemia virus, which is known to be present in the germ line of the 
mouse strain C57BL/6 used here*®°’. Evidently endogenous retroviral 
particles can be shed into the mouse gut and detected by our methods, 
providing a positive control for our analysis of human samples. No 
additional VLP contigs that passed quality filtering were detected. 


Possible contribution of human endogeneous retroviruses 
Human endogenous retroviruses (HERVs) are another candidate source 
of viral particles in neonates, and low levels of these sequences could 
be detected in VLP DNA fractions. However, HERV particles contain 
RNA, and HERVs were not detected significantly in RNA VLP fractions 
(Extended Data Fig. 10). Quality-control studies showed that HERV 
DNAs were probably contributed by contaminating human DNA, and 
were present in proportions predicted given the frequencies of other 
human genomic repeated sequences. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


Sample information and raw sequences are available in the National 
Center for Biotechnology Information Sequence Read Archive under 
BioProject ID PRJNA524703 (Supplementary Table 8). The isolated bac- 
terial genome sequences have been deposited at DDBJ/ENA/GenBank 
under the accession numbers WVTFOOOOO0O000-WVUCOO000000 
(Supplementary Table 3). 


Code availability 


All bioinformatic scripts are available on Github (https://github.com/ 
guanxiangliang/liang2019). 
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Extended Data Fig. 1| Overview of total stool microbial shotgun 
metagenomic sequencing. a, Percentage of reads mapped to human or 
microbial genomes or that were unassigned. The types of DNA detected are 
indicated onthe right. b, Correlation between the percentage of human DNA 
and sampling time after delivery using month-O samples (n=20). The 
percentage of human DNA is shown on the yaxis, and the sampling time after 
delivery is shown on the xaxis. The black dashed line shows the linear 
regression line and the grey-shaded region shows the 95% confidence interval 
for the slope. Two-sided Spearman’s rank-order correlation method was used 
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to test significance (R represents Spearman’s p).c, Taxonomic composition of 
bacteria at the phylum level. The total read number is shown on they axis; the x 
axis shows different samples. d, Bacterial richness. The yaxis shows the 
richness calculated as the number of observed species. e, Bacterial diversity. 
d,e, Atwo-sided Wilcoxon rank-sum test was used to test the difference 
between different age groups (n=20 infants at three time points). The 
horizontal lines in the box plots represent the third quartile, median and first 
quartile; whiskers extend to +1.5x the interquartile range. The dots represent 
the outliers. 
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Extended Data Fig. 2 | Summary of virome sequencing of infant stool. a, 
Heat map summarizing the representation of the top five most-abundant DNA 
viral contigs in each sample. Samples are grouped sequentially by infant on 
both the xaxis and yaxis. The last group of infants on the x axis are 

negative control samples. Circularity indicates whether a contig is circular 
(orange colour) or not (light-green colour). The heat map map colour 
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infants at three time points were tested. The horizontal lines in box plots 
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Extended Data Fig. 4| Life cycles of bacteriophages. a, Diagram of lytic and 
lysogenic bacteriophage replication (based ona previous study’). Not shown 
are additional phage replication strategies, such as chronic infection and 
pseudolysogeny. b, Prediction of replication modes from contig sequences 
using PHACTS. The xaxis shows the probability that a contig belongs toa lytic 
or temperate phage predicted by PHACTS. The yaxis shows the viral contig 
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number. In total, 1,029 phage contigs with at least 10 open-reading frames were 
used inthis analysis. Of 1,029 contigs, 233 were predicted to be lytic and 794 
were predicted to be temperate. Probability values obtained from PHACTS 
were standardized between —1 and 1, which was presented as a probability to be 
lytic or temperate. 
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Extended Data Fig. 5| Prophage induction in the early-life virome. 

a, Comparison of the extent of sequence alignment of induced VLP sequences 
from bacterial strains compared with VLP sequences from stool samples. 
Contigs were generated from mitomycin-C-induced VLPs from purified 
bacterial strains from stool (n=33 phage contigs from 16 bacterial isolates), 
then VLP reads from faeces were aligned to these contigs and quantified. 
‘Within infants’ indicates matching stool VLPs to induced VLPs from purified 
bacteria for samples all from the same infant. ‘Between infants’ indicates 
alignment of stool VLPs versus induced VLPs from different infants. The 
horizontal lines in box plots represent the third quartile, median and first 
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quartile. The dots represent the outliers. Samples were compared using a 
two-sided Wilcoxon rank-sum test. b, Correlation between the proportion of 
each bacterium inthe infant gut community and the proportion of prophages 
from that bacterial species inthe infant’s gut virome. This plot is based on VLP 
sequences of phages produced by spontaneous induction (n = 42 phage contigs 
from 20 bacterial isolates). This is different from Fig. 2d, whichis based on VLP 
sequences of phages produced after induction with mitomycin C. The black 
dashed line shows the linear regression line and the grey-shaded region shows 
the 95% confidence interval for the slope. The correlation was tested using a 
two-sided Spearman’s rank-order correlation (R represents Spearman’s p). 
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Extended Data Fig. 6| Colonization by crAssphages in different age groups. The percentage of crAssphage-positive infants (as scored by requiring that the 
crAssphage genome was more than 33% covered by sequence reads from stool VLPs). 
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Extended Data Fig. 7 | See next page for caption. 
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Extended Data Fig. 7 | Profiling of animal-cell viruses by virome 
sequencing. a,c, f,h, Percentage of infants positive for animal cell-associated 
viruses using different viral genome coverage cut-offs in the discovery cohort 
(a, f) and validation cohort (c,h). The green line shows the data from infants 
who were formula fed (a, c) or born by caesarean (C)-section delivery (f, h), and 
the yellow line shows the data from infants fed with breast milk or who were 
mixed fed (a, c) or were born by spontaneous vaginal delivery (f,h).b,d,g, i, 
Two-sided Fisher’s exact test on infant feeding types (b, d) and delivery types 
(g, i) using different viral genome coverage cut-offs in the discovery cohort (b, 
g) and validation cohort (d, i). The horizontal red line indicates P=0.05.e,j, 
Comparison of the relative abundance of animal-cell viruses between different 


feeding types (e) and delivery types (j). The abundance (reads per million total 
reads after log transformation) is shown on they axis. A two-sided Wilcoxon 
rank-sum test was used to test the difference. The horizontal lines in box plots 
represent the third quartile, median and first quartile; whiskers extend to +1.5x 
the interquartile range. The dots represent the outliers. k, Genome coverage 
fraction of negative control samples for animal-cell viruses. The maximal 
fraction of animal viral genome coverage for each negative control sample 
(n=25) is shown on they axis. Different negative control samples are shown on 
thex axis. Note that coverage never exceeds 10%. a,b, f,g,n=20 samples 

from the discovery cohort were used; c-e, h-j,n=125 samples fromthe 
validation cohort were used. 
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Extended Data Fig. 8 | Phage population structure. a, Statistical tests of the 
association of clinical variables with phage population structure. Variables are 
shown in the first column. Pvalues and FDR-corrected Pvalues are shown inthe 
second and third columns. All categorized variables, such as infant age, infant 
feeding type, infant delivery type, infant gender, mother body type, formula 
type, mother pregnancy induced hypertension or diabetes and mother 
chorioamnionitis were tested by PERMANOVA. Continuous variables, 
including gestational age, infant birth weight, household underage number, 
household number and mother pregnancy weight gain were tested by Envfit. 
Allsamples from both discovery US and validation US cohorts (n=185) were 
used to test infant age effects, and pooled samples at month3 and month4 


from both discovery US and validation US cohorts (n=145) were used to test 
other variables. b, PCoA plot based on phage Pfam counts per sample, coloured 
by infant ages. This analysis is based onthe Bray-Curtis dissimilarity index for 
all stool samples from both discovery US and validation US cohorts (n=185). 
Negative control samples were not included for Bray—Curtis dissimilarity 
assessment and statistical tests. c-e, PCoA plots of phage Pfam components, 
coloured by infant feeding types (c), delivery type (d) and infant gender (e). 
This analysis is based on pooled samples at month3 and month 4 from both 
discovery US and validation US cohorts (n=145), and as ina, PERMANOVA was 
used to test the differences. FDR-corrected Pvalues are shown. 
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Extended Data Fig. 9| 16S qPCR before and after VLP purification. Red and 
light-blue dots show before and after separately, and the horizontal lines 
represent the means (n= 20 infants at three time points were tested). A 
two-sided Wilcoxon signed-rank test was used to test the difference. 
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Month 0 Month 1 Month 4 Negative Control 
Extended Data Fig. 10| Percentage of DNA aligning to sequences of HERVs short interspersed nuclear elements, indicating that they are derived from 
in each sample. The percentage of HERV sequences in stool VLPs is shownon human DNA contamination. Data are mean +s.e.m.;n=20 infants at three time 


they axis. Sample type and time point is shown on thex axis. The proportion of points were tested. 
HERV sequences paralleled those of long interspersed nuclear elements and 
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ry] The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 
LC] Astatement on whether measurements were taken from distinct samples or whether the same sampie was measured repeatedly 


Oo x The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


[_]|BX] A description of all covariates tested 
CT] A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


g A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) 
AN AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) 


Oo For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted 
ZN Give P values as exact values whenever suitable. 


[1X] For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 


C] For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 


Cc] Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


Our web collection on statistics for biologists contains articles on many of the points above. 


Software and code 


Policy information about availability of computer code 


Data collection i | Commercial softwares were not used for data collection. i 


Data analysis i QPCR \ was s conducted and anazlyed ona 7500 Fast Real Time qPCR system. Raw sequence data was analyzed in Linux environment 

(Ubuntu 16.04.6 LTS}. Softwares under Linux system included Sunbeam (v2.0),Trimmomatic (v0.36), Komplexity (0.3.0), BBmap (v38.22}. 
Bowtie2 (v2.2.6), megahit (v1.1.3}, prodigal {v2.6.3}, Samtools (v1.7), HMMER (v3.1), Bedtools (v2.25}, Blastp (v2.2.31), Blastn {v2.2.31), 
SPAdes (v3.12.0), SSPACE (v2.0), and checkM (v1.0.12}. RStudio (v1.1.442) was used for downstream and statistical analysis. Softwares 
used in Rstudio included & {v3.4.4}, MASS (v7.3-49), gtools (v3.8.1}, viridis (v0.5.1), viridisLite (v0.3.0), gtable (v0.3.0), ggsci (v2.9), vegan 
(v2.5-5), lattice (v0.20-35}, permute (v0.9-5), taxonomizr (v0.5.3), data.table (vi.12.2}, qiimer (v0.9.4), randomcoloR (v1.1.0), 

| RColorBrewer (v1.1-2}, pheatmap (v1.0.12), tidyr (VO.8.3}, reshape (v0.8.8), stringr (v1.4.0), dplyr (v0.8.3), plys (v1.8.4), scales {v1.0.0}, 

i ggbeeswarm (v0.6.0), ggplot2 (v3.2.1}, phyfoseq (v1.22.3}, MALDI Blotyper Realtime Classification and Biotyper software (v3.0) was used 
for MALDE- TOF BD instrument. 
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‘sample information and raw u sequences are available in the National Center for Biotechnology Information Sequence Read Archive under BioProject ID 


*PRINAS24703. The isolated bacterial genome sequences have been deposited at ODBJ/ENA/GenBank under the accession WVTFOOOG0000-WVUCO0000000 ~ 
: (Supplementary Table 3). Alt bioinformatic scripts are available on Github (https://github,com/guanxiangliang/liang2019). 
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Sample size [No initial sample size was calculated for our first cohort (20 infants} duet to 5 the exploratory nature of the study. We then cought to validate 
findings on the protective effect of breastfeeding, which did not achieve significance in the small initial cohort. We acquired a cohort that we! 
calculated would likely provide sufficient resotutton to yield a significant result given similar effect sizes, Assuming the breastmilk-exposed 
group had a proportion of 10% virus positive samples and our formutla-only group had 40% virus positive samples (a conservative estimate 
| from our discovery cohort), our collection of 79 breastmilk fed samples and 46 formuta fed samples would provide a 96% chance of detecting . 


la significant difference at P = 0.05. For our third cohort (African samples), we followed the power analysis used in assessing our second cohort. ; 


Data exclusions i No available data was excluded. | 


Replication { for the analysis of the effects of breastfeeding, three separate cohorts were studied, Protective effect of breastfeeding was found in the i 
discovery cohort, which did not achieve significance due to the small sample size. Two larger validation cohorts showed significant protective 
effects of breastfeeding. j 


Randomization The observational cohort was s segregated into formula- fed and breastmilk-exposed infants. Subjects were chosen to provide a as balanced 
representation of the groups as possible given the cohort compositions. Subjects were subsampled randomly within each group where 
| possible, 


Blinding [ Blinding was not possible in the context of analyzing breast feeding. 
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Laboratory animals “Fecal samples were obtained as discarded animal biproducts from a centralized germ-free mouse facility (Protocol 805449; end | 
‘date 8/28/2020). Species Mus musculus, strain CS7BL/6 strain, male and female, 10 weeks of age. | 


Wild animals i No wild animals were e used in this study, i 
’ Field-collected samples ‘This study did not involve samples collected from the fi eld. . 
Ethics oversight ‘No ethical approval or guidance was craauied: Fecal samples were obtained as discarded animal biproducts from a centralized ! 


“germ-free mouse facility (Protocol 805449; end date 8/28/2020) in the University of Pennsylvania. J 
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Human research participants 
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Population characteristics The Infant Growth nd Microbiome Study {\Gram) was approved by the Committee for the Protection of Human Subjects at The : 
: Children’ s Hospital of Philadelphia (IRB 14-010833)}., African-American women planning to detiver at the Hospital of the University : 
of Pennsylvania and their infants were enrolled. Study visits were conducted at The Children's Hospital of Phitadetphia. A total of | 
| 20 healthy, term infants were recruited for the discovery cohort. Stool samples were collected longitudinaily at dayO to 4 days | 
after birth (meconium samples, Month 0), month 1 {Month 1), and month 4 (Month 4). The participants in an independent | 
i validation cohort had the same inclusion and exclusion criteria as the discovery cohort {only at month 4, n = 86}. Metadata 
‘regarding delivery mode, infant feeding and health outcomes was coliected by medica! chart review and in-person interview by | 

‘trained research personne. 

‘The Microbiome, Antibiotic, and Growth Infant Cohort (MAGIC) Study was approved by the Committee for the Protection of | 

: Human Subjects at Children's Hospital of Philadelphia (IRB 15-022623)}. The study enrolled children born at Pennsylvania i 

: Hospital, Philadelphia, PA, receiving preventive health care in the CHOP Primary Care Network or participating in private | 

| practices, together with their biological mothers. The distribution of race, ethnicity, and sex of the newborns reflected the I 
| 
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‘ general distribution in the participating sites. All subjects enrofled were less than 120 hours of age, greater than 36 weeks 
gestation, greater than 2000 grams, and spent less than 120 hours in the neonatal care unit. Mothers were over the age of 18 
_and spoke English. A total of 39 heathy, term babies were used for this cohort. Study visits were conducted at Children’s Hospital | 
‘ of Philadelphia. Stool sampies were collected and questionnaires administered at birth and every 3 months until the subject 

‘ reached 24 months of age. Stool samples obtained at 3 months of life were used for this cohort. Mother and baby clinical and 
: metadata were collected via medical chart review and parent questionnaires. 

'The Botswana Infant Microbiome Study was approved by the Botswana Ministry of Heaith (IRB HPDME 13/8/1) and Institutional: 
Review Soards at the University of Pennsylvania {IRB 822692} and Duke University (IRB 319561). Mother-infant pairs (n= 300) 
: were enrolled within 48 hours of detivery at Princess Marina Hospital and two public clinics in or near Gaborone, Botswana. 

' Exclusion criteria included maternal age less than 18 years, infant birth weight less than 2000 grams, multiple gestation 

| pregnancy, and Caesarian defivery. Participants were seen for monthly study visits until the infant was 6 months of age and eve 
‘other month thereafter until the infant was 12 months of age. At all visits, a questionnaire was administered and clinical samples } 
' were obtained from the infant and the mother, Metadata, including data regarding infant feeding practices, were colfected by } 
: medical chart review and in-person interview by trained research personnel. i 
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Recruitment The Infant Growth and Microbiome Study (IGram) recruited African-infant bora at the Hospital of the University of Pennsylvania, i 
| Study visits were conducted at The Children's Hospital of Phitadelphia. The Microbiome, Antibiotic, and Growth Infant Cohort 
(MAGIC) enrolled children born at Pennsylvania Hospital, Philadelphia, PA, receiving preventive health care in the CHOP Primary 
| Care Network or participating in private practices, together with their biological mothers. The distribution of race, ethnicity, and | 
| sex of the newborns reflected the general distribution in the participating sites. All subjects enrolled were ess than 120 hours of | 
! age, greater than 36 weeks gestation, greater than 2000 grams, and spent less than 120 hours in the neonatal care unit. Meiers) | 
: were over the age of 18 and spoke English. 
The cohorts above are US/urban cohorts which may not perfectly reflect acquisition of viruses in other settings, therefore, we 
‘included another cohort from developing country. The Botswana Infant Microbiome Study recruited mother-iafant pairs within 
48 hours of delivery at Princess Marina Hospital and two public clinics in or near Gaborone, Botswana. Exclusion criteria included | 
‘maternal age tess than 18 years, infant birth weight less than 2000 grams, multiple gestation pregnancy, and Caesarian delivery. 
| | Participants were seen for monthly study visits until the infant was 6 months of age and every other month thereafter until the 
iinfant was 12 months of age. 


Ethics oversight i The Infant Growth and Microbiome Study {{Gram) was approved by the Committee for the Protection of Human Subjects at The 
| Children’s Hospital of Philadelphia ({R824-010833)}. 
' The Microbiome, Antibiotic, and Growth Infant Cohort (MAGIC) Study was approved by the Committee for the Protection of 
| Human Subjects at Children’s Hospital of Philadelphia (IRB 15-012623). 
iThe Botswana Infant Microbiome Study was approved by the Botswana Ministry of Health (IRB HPDME 13/8/1) and Institutional 
| Review Boards at the University of Pennsylvania {IRB 822692) and Ouke University (IRB 319561). i 
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Intestinal health relies on the immunosuppressive activity of CD4* regulatory T (T,..) 
cells'. Expression of the transcription factor Foxp3 defines this lineage, and can be 
induced extrathymically by dietary or commensal-derived antigens in a process 
assisted by a Foxp3 enhancer known as conserved non-coding sequence 1 (CNS1)”*. 
Products of microbial fermentation including butyrate facilitate the generation of 
peripherally induced T,,. (pT, <g) cells* ’, indicating that metabolites shape the 
composition of the colonic immune cell population. In addition to dietary 
components, bacteria modify host-derived molecules, generating a number of 
biologically active substances. This is epitomized by the bacterial transformation of 
bile acids, which creates a complex pool of steroids’ with a range of physiological 
functions’. Here we screened the major species of deconjugated bile acids for their 
ability to potentiate the differentiation of pT,., cells. We found that the secondary bile 
acid 3B-hydroxydeoxycholic acid (isoDCA) increased Foxp3 induction by acting on 
dendritic cells (DCs) to diminish their immunostimulatory properties. Ablating one 


receptor, the farnesoid X receptor, in DCs enhanced the generation of T,,, cells and 


reg 


imposed atranscriptional profile similar to that induced by isoDCA, suggesting an 
interaction between this bile acid and nuclear receptor. To investigate isoDCA in vivo, 
we took asynthetic biology approach and designed minimal microbial consortia 
containing engineered Bacteroides strains. IsSoDCA-producing consortia increased 
the number of colonic RORyt-expressing T,,, cells ina CNS1-dependent manner, 
suggesting enhanced extrathymic differentiation. 


Bile acids are cholesterol-derived molecules that are involved in essen- 
tial physiological processes, including nutrient absorption, glucose 
homeostasis and regulation of energy expenditure’. Upon feeding, 
endocrine signals stimulate the emptying of the gallbladder into the 
duodenum, where bile acids aid in the emulsification of dietary fats”. 
Primary or liver-derived bile acids in mammals are mostly conjugated 
with taurine or glycine, and undergo pervasive deconjugation by micro- 
bial bile salt hydrolases in the small intestine. Although most bile acids 
are transported back into the liver via enterohepatic circulation, a small 
fraction of this pool (roughly 5%) escapes reabsorption in the ileum 
and is subject to further bacterial transformation in the colon, giving 
rise to secondary bile acids”. 

Knowing that the generation of colonic pT,,.. cells is affected by micro- 
bial metabolites, we screened the major species of deconjugated bile 
acids found in mice and humans for their ability to enhance Foxp3 
induction in vitro (Fig. la—c). Two secondary bile acids—w-muricholic 


acid (w-MCA) and isoDCA—potently increased the frequency of Foxp3* 
cells among naive CD4' T cells stimulated in the presence of DCs under 
suboptimal T,., cell-inducing conditions (Fig. 1d). The differentiation 
of pro-inflammatory T,,17 cells was not affected by the presence of 
either bile acid (Extended Data Fig. 1a), suggesting a specific effect on 
the generation of T,.. cells. 

Because both isoDCA and w-MCA are isomers of molecules with no 
Tyeg cell-promoting activity, we hypothesized that the spatial orienta- 
tion of specific hydroxyl (-OH) groups is required for their effects. 
The formation of isoDCA from DCA requires oxidation of the 3a-OH 
group to an -oxo intermediate and its subsequent reduction into a 
3R-OH group”. Despite remaining poorly characterized, the conver- 
sion of B-MCA into w-MCA has also been reported to generate an -oxo 
intermediate”. We observed that the 3-oxo-derivative of DCA failed 
to potentiate T,,. cell induction to the same extent as isoDCA (Fig. le). 
Similarly, oxidation of the 6a-OH group abolished T,,, cell induction by 
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Fig. 1| Bacterial epimerization of bile acids generates molecules with 

Tyeg cell-inducing activity. a, b, Types of bile acids (a), their basic structure (b), 
andasummary of substitutions around the cholesterol backbone (a).c, Screen 
setup: naive CD4' T cells (5 x 10*) were cocultured with DCs (1x 10°) in 
suboptimal T,,, cell-inducing conditions (Ing ml” transforming growth factor 
(TGF)-B, 1g ml“ CD3 antibody, 100 U mI IL-2) and analysed on day 3 by 
fluorescence-activated cell sorting (FACS). The bile acids listed ina were added 
at the doses indicated ind, e, g.d, Frequencies of Foxp3* CD4' T cells after 
exposure to various concentrations of bile acids. e, The 3B-OH group of isoDCA 
is required for its T,,. cell-inducing effects. Cells were cocultured as described 
in cand incubated with 3-oxoDCA or isoDCA at the indicated concentrations. 

f, Assessment of cell proliferation. Naive CD4' T cells were labelled with Cell 


w-MCA (Extended Data Fig. 1b), indicating that microbial epimerization 
of bile acids gives rise to metabolites with unique immunomodula- 
tory properties. Unlike w-MCA, isoDCA is found at substantial levels 
(roughly 50 1M)" in the intestinal contents of healthy adult humans, 
and its biosynthesis is well characterized”. Therefore, we focused on 
understanding the biological activity of isoDCA. 

The amphipathic nature of bile acids makes these molecules natural 
detergents with potentially deleterious effects on cell viability and 
proliferation. Given that conditions associated with reduced T-cell pro- 
liferation are conducive to Foxp3 expression”, we assessed whether 
this mechanism can account for the potentiation of bile acid-mediated 
T,.g cell differentiation. ISoDCA significantly reduced T cell proliferation 
compared with vehicle (Fig. 1f). We observed a similar effect in cells 
treated with 3-oxoDCA, which did not promote T,,, cell generation ina 
dose-dependent manner, indicating that decreased proliferation alone 
cannot account for the increased frequency of Foxp3* T cells. We next 
assessed whether DCs were required for isoDCA-mediated potentiation 
of T,.. cell induction. Neither isoDCA nor 3-oxoDCA increased T,,., cell 
frequencies when naive T cells were activated by beads coated with 
CD3/CD28 antibodies. Rather, both bile acids caused a decrease in the 
percentage of Foxp3* cells (Fig. 1g), which was also accompanied by 
reduced proliferation (Extended Data Fig. Ic). Consistent with these 
observations, isoDCA still enhanced T,,, cell frequencies when naive 
T cells lacking the Foxp3 enhancer CNS3 were cocultured with DCs 
(data not shown). Together, these results support a requirement for 
antigen-presenting cells (APCs) in mediating the T,,, cell-inducing 
effects of isoDCA, and suggest a T cell-extrinsic immunoregulatory 
mechanism that is distinct from those reported for other bile acids®””. 

Bile acids affect mammalian physiology through interactions with 
numerous receptors, including the farnesoid X receptor (FXR)’. We 
therefore used naive CD4"T cells from CD4°" NriIh4“" and CD4"" Nrih4" 
mice as well as DCs from Csfir’ Nrih4 and Csfir’ Nrih4™ mice to 
assess a potential role for FXR in the induction of T,,, cells by isoDCA 
in vitro. FXR deficiency in T cells did not change Foxp3 induction 
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Trace Violet (CTV) and cultured with DCs as inc, inthe presence of isoDCA or 
3-0xoDCA (100 EM). CTV dilution was assessed on day 3 by FACS. The y-axis 
shows the percentage of cells that underwent the indicated number of cell 
divisions. g, Effect of bile acids onT,,, cell differentiation in the absence of DCs. 
Naive CD4' T cells were activated with CD3/CD28 antibody-coated beads under 
suboptimal T,,, cell-inducing conditions. IsoDCA and 3-oxoDCA were added at 
theindicated concentrations and cells were analysed on day 3 by FACS. Shown 
are means +s.d. of technical replicates (d,n=4; e-g,n=3).*P<0.05,**P< 0.01, 
****P < 0.0001 versus vehicle; plus symbol, P< 0.0001 versus isoDCA (paired 
concentration) by one-way (d, e, g) or two-way (f) analysis of variance (ANOVA) 
followed bya Dunnet (d) or Tukey’s (e-g) multiple comparison test. Data are 
representative of at least three independent experiments. 


in response to isoDCA (Fig. 2a, left panel). By contrast, DCs lacking 
FXR generated a higher frequency of Foxp3* cells at baseline, and 
this increase could not be further enhanced by addition of isoDCA 
(Fig. 2a, right panel). These results support our finding that isoDCA 
acts upon DCs to potentiate the induction of T,.. cells, and suggest 
that FXR is involved in this process. To rule out effects from potential 
differences in the splenic DC population used as APCs, we analysed 
their composition in Csf1r" Nrih4“ mice and wild-type littermate 
controls. Although we failed to detect differences in the composition 
of the CD11c* MHC class II* cell population (mostly DCs) in the spleen 
and in other organs (Extended Data Fig. 2b), we did observe higher 
numbers of Foxp3* cells in the large intestine lamina propria (LILP) 
of Csfir’ NrIh4“ mice (Extended Data Fig. 2e), particularly of the 
RORyt* Foxp3* subset (Extended Data Fig. 2e). As pT,-. cells that arise 
in response to microbial antigens are predominantly RORyt' (refs. 7°”/), 
these data suggest that the absence of FXRinthe myeloid compartment 
facilitates extrathymic generation of T,., cells in vivo. 

We next carried out RNA-sequencing (RNA-seq) analysis to compre- 
hensively assess the effects of isoDCA treatment and FXR ablation in 
DCs. DCs exposed to isoDCA showed reduced expression of several 
genes related to antigen processing and presentation, including Ciita, 
Ctse, H2ab, H2eb and H2dma (Fig. 2b). Genes involved in detecting 
and transducing pro-inflammatory cues—such as TIr7, Tlr12, Nirc5, 
Stat2, Stat6, Irfl and Irf7—were also downregulated, as were several 
genes downstream of interferon signalling. Among the genes induced 
by isoDCA treatment, we identified negative regulators of NFKB, 
mitogen-activated protein kinase (MAPK) and cytokine-receptor 
signalling, including Nfkbia, Dusp1, DuspS and Socs1. Given that the 
transcriptional profile imposed by isoDCA suggested an overall 
anti-inflammatory state, we tested the effects of this bile acid on the 
ability of DCs to prime antigen-specific T cells and to secrete cytokines 
in response to microbial cues. IsSoDCA treatment decreased the pro- 
duction of the inflammatory cytokines tumour necrosis factor (TNF)-a 
and interleukin (IL)-6 upon agonist stimulation of Toll-like receptors 
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Fig. 2| Potentiation of T,., cell generation by isoDCA requires FXR 
expression in DCs. a, Effects of FXR deficiency inT cells (wild-type, WT, and 
FXR-deficient, AFXR) (left) or DCs (right) on T,,, cell induction by isoDCA 

(50 uM). b-f, RNA-seq analysis of FACS-purified DCs 24 h after exposure to bile 
acids. b, Transcriptional profiling of WT DCs treated with isoDCA (50 LM). 
Differentially expressed genes (adjusted P-value < 0.05) are in orange. 

c, Fold-change (FC) versus FC plot comparing the transcriptional changes 
induced by isoDCA treatment (x-axis) and FXR deficiency (y-axis). Genes 
downregulated by isoDCA are in blue; genes induced by isoDCA are in orange. 
d, Overlap between genes regulated by isoDCA treatment (orange) and FXR 
deficiency (blue). e, FC versus FC plot comparing the effects of isoDCA on WT 
(x-axis) and FXR-deficient (y-axis) DCs, colour-coded as inc. f, Overlap between 
genes regulated by isoDCA in WT (orange) and FXR-deficient (blue) DCs. 

g, Differential scanning fluorimetry (DSF) experiment with recombinant FXR- 
LBD and bile acids at 1,000-fold (500 uM) or 200-fold (100 1M) excess. Shown 


(TLRs) (Extended Data Fig. 3a). We further used a reporter T-cell line, 
in which green fluorescent protein (GFP) is under the control of NFAT 
(the nuclear factor of activated T cells), and which expresses OT-II—an 
MHC-II-restricted, ovalbumin-specific T-cell antigen receptor (TCR). 
We found that isoDCA reduced the DC-mediated activation of this cell 
line after pulse treatment with ovalbumin (Extended Data Fig. 3b), 
indicating broad anti-inflammatory activities of this bile acid. 

Given our results implicating FXR in induction of T,.,cells by isoDCA, 
we compared the transcriptional changes elicited by FXR deficiency 
and exposure to bile acid. Genes that were significantly modulated 
in response to isoDCA treatment were, by and large, differentially 
expressed between FXR-deficient and -sufficient cells (Fig. 2c), indi- 
cating that these two perturbations lead to similar transcriptional 
changes in DCs. Notably, more than 54% of genes induced by isoDCA 
treatment were also significantly upregulated in FXR-deficient DCs 
compared with wild-type cells (Fig. 2c, d), implying that isoDCA may 
counteract FXR-mediated transcriptional repression. Although tran- 
scripts downregulated in response to isoDCA were generally present 
at lower levels in FXR-deficient DCs relative to wild-type cells (Fig. 2c), 
their expression was further decreased when FXR-deficient DCs were 
treated withisoDCA (Fig. 2e, f). Thus, while FXR may contribute to driv- 
ing the expression of genes downregulated by bile acid treatment, these 
results suggest that additional, FXR-independent mechanisms also 
regulate these targets. We next sought to characterize the molecular 
interaction between FXR and isoDCA. While the naturally occurring FXR 
agonist chenodeoxycholic acid (CDCA)” induced a substantial shift in 
the melting temperature of recombinant FXR ligand-binding domain 
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is the Boltzmann melting temperature of the FXR-LBD. h, i, Luciferase reporter 
assays. Cells expressing a Gal4—F XR-LBD fusion protein were treated with 
vehicle, CDCA, or GW4064 alone or combined with isoDCA (100 iM). 

j, FRET-based coactivator-recruitment assay. The indicated bile acids were 
mixed with glutathione-S-transferase (GST)-tagged FXR-LBD (5nM), 
fluorescein isothiocyanate (FITC)-SRC2-2 coactivator peptide (SOO nM) and 
GST antibody (5 nM). Shownis the ratio of fluorescence at 520 nm/485nm. 
a,g,j, Displaying means +s.d. of technical replicates (a, g,n=3;j,n=4), 
representative of three independent experiments. Data in b-f (n=3) are from 
one experiment. Datainh, i(n=3) are shownas means +s.d., pooled from three 
independent experiments. Statistical significance determined by a one-way (g) 
or two-way (a, h-j) ANOVA followed by Tukey’s or Sidak’s test. *P< 0.05, 

***P < 0.001, ****P< 0.0001 vs vehicle; plus symbol shows P< 0.05 versus paired 
concentration of agonist; ns, not significant. 


(LBD), isoDCA produced a less pronounced change, only evident at a 
high ligand-to-protein ratio (Fig. 2g). Inline with this observation, CDCA 
elicited a robust signal in an FXR luciferase reporter assay, but isoDCA 
treatment had no effect (Fig. 2h). These results suggest a distinct mode 
of interaction between FXR and isoDCA, and raise the possibility of 
functional antagonism of this nuclear receptor. Indeed, we observed 
that isoDCA reduced the luciferase reporter signal in response toa 
synthetic FXR agonist, GW4064 (Fig. 2i). Corroborating these data, 
we found that isoDCA limited CDCA-induced FXR activity in a fluores- 
cence resonance energy transfer (FRET)-based coactivator-recruitment 
assay (Fig. 2j). Together, these findings suggest that antagonizing the 
FXR-dependent transcriptional output of APCs may contribute tothe 
PT, .g cell-inducing effects of isoDCA. 

We then set out to explore the biological effects of isoDCA in vivo. 
Production of isoDCA from cholic acid involves chemical transforma- 
tions performed by at least two different bacteria. The capacity for 
cleavage of the 7a-hydroxyl group from cholic acid has been observed 
in Clostridium scindens®, while epimerization of the 3a-hydroxyl group 
of DCA was characterized in Ruminococcus gnavus" (Fig. 3a). To assess 
the effects of bacterial transformation of bile acids in the colon, we 
reconstructed the isomerization pathway for isoDCA by inserting 
hydroxysteroid dehydrogenases from R. gnavus (Rumgna_02133 and 
Rumgna_00694) into Bacteroides thetaiotaomicron (B. theta), a geneti- 
cally tractable commensal (B. theta") (Fig. 3b). The active site of the 
enzyme encoded by Rumgna_00694 contains a tyrosine residue that is 
predicted to be conserved by homology modelling, which we changed 
to phenylalanine to create a catalytically dead mutant (B. theta®) 
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Fig. 3 | Engineering anisoDCA-producing strain of Bacteroides 
thetaiotaomicron (B. theta). a, Enzymes involved in isoDCA formation 

from DCA. Rumgna_02133 and Rumgna_00694, two hydroxysteroid 
dehydrogenases (HSDHs) present in Ruminococcus gnavus, were identified in 
ref. "as key enzymes catalysing epimerization of the 3-OH group of DCA by this 
bacterium. b, Cloning strategy to reconstitute the pathway for isoDCA 
generation in B. theta. Constructs for Rumgna_02133and Rumgna_00694 were 
codon-optimized, put under the control of astrong, constitutive promoter 
(Pcon) in B. theta and chromosomally integrated by conjugation. RBS, 
ribosome-binding site. c, Rationale for the design of acatalytically dead (eCD) 
mutant of Rumgna_00694 (Rg00694). Alignment of partial amino-acid 
sequences of bacterial HSDHs: 5epo_7aHSDH from Clostridium absonum; 
1fmc_7aHSDH from Escherichia coli; 5gt9_7bHSDH from Collinsella 
aerofaciens. A tyrosine (Y, red) found between additional conserved residues 
(blue) in the putative active site of HSDHs was mutated to phenylalanine 
(Y165F).d, Characterization of the biochemical activity of B. theta" and 

B. theta® strains by thin-layer chromatography (TLC). Bacteria were grownto 
exponential phase and transferred to media containing DCA. Following 
incubation for 24 h, media was extracted in ethyl acetate and analysed by TLC. 
DCA andisoDCA controls are shown on the four rightmost lanes. B. theta" 
(three leftmost lanes) converts DCA into isoDCA, while B. theta mutant 
(three middle lanes) shows no activity. Data are representative of two 
experiments. 


(Fig. 3c). As expected, we observed robust production of isoDCA by the 
engineered functional B. theta strain (as assessed by thin-layer chro- 
matography), while conversion of DCA by the corresponding mutant 
strain was undetectable (Fig. 3d and Extended Data Fig. 4a). 
Bacteroides species lack 7a-dehydroxylation activity; therefore, we 
assembled an isoDCA-producing consortium by combining C. scindens 
with our engineered strains (Fig. 4a, b). Both wild-type and catalyti- 
cally dead consortia colonized mice to similar levels (Extended Data 
Fig. 5). All colonization conditions increased the percentage of colonic 
Tyeg Cells, including the RORyt* cell subset (Fig. 4c and Extended Data 
Fig. 6a, b). Despite having similar frequencies of bulk Foxp3’ cells, 
recipients of the functional consortium showed a significant increase 
inthe RORyt’ pT,,, cell subset compared with mice colonized with the 
catalytically dead ensemble (Fig. 4d). In agreement with the notion 
that Clostridium species and secondary bile acids are indigenous to 
the colon, we failed to detect differential levels of RORyt-expressing 
PT, -g cells in the mesenteric lymph node or small intestine lamina pro- 
pria (Extended Data Fig. 6c, d). Although Foxp3’ RORyt* CD4* T cells 
were robustly induced in the colon of mice conventionalized by faecal 
microbiota transplantation (FMT), this effector T cell population was 
comparable between recipients of the functional or catalytically dead 
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consortia both at 10 days and at 4 weeks after colonization (Extended 
Data Fig. 6e, f), consistent with our in vitro finding that isoDCA had 
no substantial effect on the generation of T,,17 cells (Extended Data 
Fig. 1a). By contrast, the differences in frequencies of colonic RORyt* 
PT, g cells persisted at this later time point, although they were less 
pronounced (Extended Data Fig. 6g). 

To exclude potential effects of bacterial strain background and to 
generalize these initial findings, we engineered two additional species 
of intestinal commensals to produce isoDCA. For this purpose, we gen- 
erated functional and catalytically dead strains of Bacteroides fragilis 
(B. frag and B. frag respectively) and Bacteroides ovatus (B. ova" 
and B.ova*) using the above strategy. All functional engineered Bacte- 
roides strains converted DCA into isoDCA in vitro, with B. frag and B. 
ova™' producing more isoDCA compared with B.theta™ and yielding 
similar levels to R. gnavus as measured by liquid chromatography- 
mass spectrometry (Extended Data Fig. 7a). No isoDCA was detected 
in the culture supernatants of catalytically dead strains. Functional 
engineered B. frag and B. ova strains also induced higher frequencies 
of colonic RORyt' T,,, cells by comparison with their respective catalyti- 
cally dead counterparts (Fig. 4e, fand Extended Data Fig. 7b), suggest- 
ing that this effect does not depend ona particular strain background 
and is unlikely to be caused by a potential ‘feedback’ of isoDCA onthe 
bacterium. Notably, isoDCA levels in the caecal contents of animals 
colonized with a high-producing strain (B. frag") did not surpass 
that of mice receiving FMT (Extended Data Fig. 7c), indicating that our 
reconstruction of this enzymatic pathway did not lead to supraphysi- 
ological bile acid levels. Production of short-chain fatty acids (SCFAs) 
was comparable between consortia (Extended Data Fig. 8), suggesting 
that bile acid transformation did not broadly affect bacterial metabo- 
lism, and that isoDCA-producing bacteria were able to increase the 
number of pT,,, cells in the presence of other ‘tolerogenic’ metabolites. 

Given that the hydroxysteroid dehydrogenases introduced into 
Bacteroides could potentially modify substrates other than DCA, 
we tested the effects of our engineered bacteria in the absence of 
the 7a-dehydroxylating commensal-—that is, without C. scindens. 
Mono-colonization of germ-free mice with functional or catalyti- 
cally dead B. frag or B. ova resulted in similar frequencies of colonic 
RORyt" pT,,, cells (Fig. 4g,h), demonstrating that the biological activ- 
ity of our isoDCA-producing consortia depends on the presence of a 
DCA-generating bacterium. To confirm that our engineered strains 
promote bona fide pT,,, cell generation in vivo, we colonized germ-free 
CNS1-sufficient (Foxp3“*) and -deficient (Foxp3“"“™) mice with func- 
tional or catalytically dead consortia. By comparison with catalytically 
dead controls, colonization with functional consortia led to much 
higher frequencies of Foxp3* RORyt’ pT,,, cells in CNS1-sufficient ani- 
mals (Fig. 4i, j). CNS1-deficient hosts with genetically impeded pT,.. 
cell generation showed similarly low frequencies of Foxp3* RORyt* 
CD4* T cells when colonized with functional or catalytically dead 
consortia (Fig. 4i, j). These experiments show that colonization with 
isoDCA-producing microbial consortia promotes de novo generation 
of colonic pT, .g cells. 

Previously, pT,., cells were shown to dampen immune responses 
during microbial colonization and to support the metabolic function 
of the gut microbiota”*. The establishment of such immunological 
tolerance to commensals along with other types of host-microbe inter- 
action probably evolved around conserved features of microbial com- 
munities, such as their metabolic output. Supporting this notion, we 
have found that, in addition to bacterial fermentation products’, the 
secondary bile acid isoDCA is also a potent inducer of pT,., cells. Using 
engineered Bacteroides strains as part of rationally designed, minimal 
microbial consortia, we have shown that isoDCA-producing bacteria 
promote the generation of pT,,. cells in vivo ina CNS1-dependent man- 
ner. ISoDCA limited FXR activity in DCs and conferred upon them an 
anti-inflammatory phenotype. Although our data support the involve- 
ment of myeloid-cell-intrinsic FXR activity in the induction of pT,,, cells, 
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Fig. 4| Defined bacterial consortia containing isoDCA-producing strains 


promote generation of pT,,, cells in vivo. a, Generation of isoDCA bya 


minimal microbial consortium. Enzymatic steps performed by C. scindens and 
the engineered Bacteroides sp. (B. thetaiotamicron, B. fragilis and B. ovatus). 


b, Experimental setup. Germ-free mice were colonized with consortia 
containing C. scindens plus B. theta" 


(PBS)) served as references. The immune-cell composition in the LILP was 
analysed on day 10 by FACS. c, d, Frequency of Foxp3* (c) and RORyt* (d) 

T,.g cells. e, f, Frequency of RORyt" T,.. 
C. scindens in combination with engineered B. ova (e) or B. frag (f). 


or C. scindens plus B. theta®®. Recipients 
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cells in germ-free mice colonized with 


the relative contribution of this and other bile acid-sensing receptorsin 
mediating the effects of isoDCA onthe mucosalimmune milieu remains 
to be investigated. In conclusion, our findings suggest that microbial 
metabolism of endogenous steroids contributes to immunological 


balance inthe colon. 
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Methods 


No statistical methods were used to predetermine sample size. The 
experiments were not randomized and the investigators were not 
blinded to allocation during experiments and outcome assessment. 


Dendritic-cellisolation 

B16 melanoma cells secreting FIt3 ligand (provided by G. Dranoff) were 
injected subcutaneously into the left flank of mice to expand splenic 
DCs in vivo. Ten to twenty days after tumour injection, spleens were 
harvested and dissociated in RPMI1640 medium containing 1.67 Um! 
liberase TL (Roche) and 50 pg mI DNase I (Roche) for 20 min at 37 °C 
with vigorous shaking (250 r.p.m.). Digested spleens were passed 
through a100-um strainer and washed in complete RPMI (RPMI 1640 
with 10 mM HEPES buffer (ThermoFisher), 1% penicillin/streptomycin 
(ThermoFisher), 1% L-glutamine (ThermoFisher) and 10% fetal bovine 
serum (FBS, ThermoFisher). Single-cell suspensions were enriched for 
DCs using a MACS CDI1Ic Microbeads Ultrapure isolation kit (Miltenyi 
Biotec) according to the manufacturer’s instructions. Purity of DC 
enrichment was assessed by FACS analysis (typical purity: greater than 
92% CD11c* MHCII" cells). 


In vitro assays 

Polarization of T cells. Naive (CD44- CD62L* GFP) CD4* T cells from 
Foxp3°’ mice were FACS-purified from spleen and peripheral (pooled 
inguinal, brachial, axial and submandibular) lymph nodes after a 
CD4-enrichment step (Dynabeads 11461D, Invitrogen), performed as 
per the manufacturer’s instructions. DC (1 x 105) plus naive CD4" T cell 
(5x 10*) cocultures were set up in the presence of 1 pg mI‘ of monoclo- 
nal CD3¢ antibody (InvivoMAb, Bioxcel). Cholic acid (catalogue number 
C19000-000), chenodeoxycholic acid (CO985-000), lithocholic acid 
(C1420-000), isolithocholic acid (C10475-00), deoxycholic acid (C1070- 
000), isodeoxycholic acid (C1165-000), 3-oxodeoxycholic acid (C1725- 
000), ursodeoxycholic (C1020-000), a-muricholic acid (C1890-00), 
B-muricholic acid (C1895-000), y-muricholic acid (C1850-000) and 
@-muricholic acid (C1888-000) were all purchased from Steraloids 
Inc. (Newport, RI). 6-Oxo-muricholic acid (5B-cholanic acid-3a,7B-diol- 
6-one) was produced by the Organic Chemistry Synthesis core at the 
Memorial Sloan Kettering Cancer Center (MSKCC). We stored 100 mM 
stocks of bile acids in dimethylsulfoxide (DMSO) at —80 °C. For T-cell 
activation inthe absence of DCs, naive CD4’ T cells (5 x 10*) were incu- 
bated with mouse T-activator CD3/CD28 Dynabeads (Gibco) at a1-to-1 
bead-to-cell ratio. For assessment of cell proliferation, naive CD4' T cells 
were labelled with 5 iM Cell Trace Violet (Invitrogen) according to the 
manufacturer’s instructions. Suboptimal T,..-induction conditions 
consisted of lng ml recombinant human TGF-B1 (R&D) and100 UmI* 
recombinant human IL-2 (Biological Resources Branch, NCI). For T,17 
differentiation, cells were incubated with 2 ng mI recombinant human 
TGF-B1 (R&D) and 20 ng mI recombinant murine IL-6 (PeproTech). All 
in vitro polarization assays were carried in complete RPMI with 10% FBS 
(final volume of 200 pl) in flat-bottom 96-well plates (USA Scientific). 
Border wells were filled with media only (no cells) to minimize evapora- 
tion during incubation at 37 °C/5% CO,. On day 3 of culture, cells were 
transferred into V-bottom plates (Fisher), pelleted by centrifugation 
and incubated with antibody staining mix containing Ghost Dye Red 
780 viability dye diluted in PBS for 15 min at 4 °C. For cytokine produc- 
tion analyses, cells were incubated for 3 h at 37 °C/5% CO, in restimula- 
tion media (complete RPMI1640 with 5% FBS, 50 ng mI PMA (Sigma), 
500 ng mI ionomycin (Sigma), 1 pg mI brefeldin A (Sigma) and 2 1M 
monensin (Sigma)). Extracellular antigens were stained for 15 min at 
4 °C with an antibody staining mix containing Ghost Dye Red 780 vi- 
ability dye diluted in PBS. Cells were fixed and permeabilized with BD 
Cytofix/Cytoperm for 20 min at 4 °C. Antibodies against intracellular 
antigens were diluted in 1x BD Perm/Wash buffer and cells were stained 
for 30 min at 4 °C. Cytometry data were acquired on a LSRII digital 


cell analyser (Becton Dickinson, NJ). Foxp3 induction was assessed by 
expression of GFP reporter protein. 123count eBeads (Invitrogen) were 
added at 5,000 beads per sample to quantify absolute cell numbers. 


TLR agonist activation. DCs (1 x 10° per well, flat-bottom 96-well plates) 
were pretreated with isoDCA or vehicle in complete RPMI 10% FBS for 
5 hbefore stimulation with various TLR ligands (Pam3CSK4, 0.5 pg mI; 
HKLM, 10’ cells ml; poly(I:C) and poly(I:C) LMW, 5 pg ml; LPS-EK, 
5 pg ml}; ST-FLA, 1 pg ml; and ODN1826, 2.5 1M) from the mouse 
TLRI1-9 agonist kit (Invivogen tlrl-kitImw) for 18 h at 37 °C/5% CO,. 
TNF-a and IL-6 levels were quantified in cell-free supernatants (roughly 
150 pl of 200 pl cell culture after centrifugation) by enzyme-linked im- 
munosorbent assay (ELISA; eBioscience, catalogue numbers 5017331 
and 5017218) according to the manufacturer’s instructions. 


Antigen-processing and -presentation assays. DCs (1 x 10*) were 
pulsed with ovalbumin (Sigma A5503, 1 mg mI) or CD3¢ antibody 
(11g ml) in the presence or absence of isoDCA for 1h at 37 °C/5% CO, 
inserum-free complete RPMI. Then, an equal volume of 10% FBS RPMI 
was added to a final serum concentration of 5%. After 5 h, we added 
1x 10* reporter T cells (TCR-a/B-null, BW5147 mouse thymoma cells” 
with NFAT-controlled GFP expression”) carrying the OTII TCR, along 
with ovalbumin or CD3¢ antibody and isoDCA (to keep concentrations 
constant) incomplete RPMI with 10% FBS. The frequency of GFP’ cells 
was determined 24 h later by FACS. Reporter cells were not tested for 
mycoplasma or subjected to additional validation other than functional 
assessment during experiments. 


RNA-sequencing of dendritic cells 

In vitro DC/T-cell coculture assays in the presence of bile acids were 
carried out as above. Following 24 h of incubation at 37 °C in 5% CO,, 
cells were transferred into V-bottom plates, pelleted by centrifugation 
and incubated with antibody staining mix containing Ghost Dye Red 
780 viability dye diluted in PBS for 15 min at 4 °C. Approximately 3 x 10* 
DCs were double-sorted on a FACSAria II (Becton Dickinson, NJ) on 
the basis of viability and CD11c/MHC II expression. Purified cells were 
resuspended in Trizol and submitted for RNA-sequencing at the Inte- 
grated Genomics Core (iGO) at MSKCC. Samples underwent SMARTer 
amplification (Takara) and were sequenced ona Hiseq platform (Illu- 
mina) at a depth of 20 million to 30 million paired-end 50-base-pair 
reads per sample. RNA-sequencing reads were aligned to the reference 
mouse genome (Gencode m19) using STAR RNA-Seq aligner”®, and read 
counts were obtained with HTSeq-count”’. The DESeq2 R package”® 
was used to perform differential gene expression analyses between 
groups. A cutoff of 0.05 was set on obtained P-values (adjusted using 
the Benjamini-Hochberg correction for multiple comparisons) to 
define statistically significant genes for each comparison. 


Luciferase assays 

Reporter cells expressing GAL4-LBD fusions for human FXR (Indigo 
Biosciences, IBOO6001) were incubated for 22-24 hin the presence 
of CDCA, the synthetic agonist GW4064 (provided with kit) and/or 
isoDCA. Cells were processed according to the manufacturer’s instruc- 
tions, including the recommended step to determine viability using 
the Live Cell Multiplex assay (LCM, Indigo Biosciences). Luminescence 
was read in a GloMax 96 Luminometer (Promega). 


Differential scanning fluorimetry assay 

IsoDCA and CDCA were tested for their ability to alter the melting 
temperature of recombinant FXR-LBD (Invitrogen) using a modified 
version of the protocol in ref. ”’. Briefly, bile acids and FXR-LBD (final 
concentration 500 nM) were diluted in assay buffer (10 mM Tris (pH 8.3), 
0.5 mM EDTA, 100 mM NaCl and 5 mM DTT, added fresh before assay 
froma x100 frozen stock) and combined into transparent 384-well 
plates (Applied Biosystems). SYPRO Orange dye (Protein Thermal Shift 


Dye, Applied Biosystems) was added at a final concentration of 1/2,000 
inareaction volume of 20 pl. Temperature-dependent changes in fluo- 
rescence were detected in a QuantStudio 6 Flex real-time quantitative 
polymerase chain reaction (qPCR) system (Applied Biosystems). The 
equipment was programmed with the following thermal profile: step 
1, temperature 25 °C, time 2 min; step 2, temperature 99 °C, time 2 min; 
continuous ramp mode; ramp rate for step 1, 1.6 °C s 7; ramp rate for 
step 2, 0.05 °C s". Data analysis was performed using PTS software 
(Applied Biosystems). 


FRET-based FXR coactivator recruitment assay 

IsoDCA and CDCA were tested for their ability to activate FXRina 
cell-free FRET assay using LanthaScreen technology according to 
the manufacturer’s protocol (Thermo Fisher, PV4833). Briefly, after 
combining diluted bile acids with GST-tagged FXR-LBD, terbium GST 
antibody and fluorescently labelled SRC2 in white, flat-bottom 384-well 
plates (Greiner Bio-one), the reaction was incubated at room tempera- 
ture inthe dark under gentle shaking (60 r.p.m.) for 1h before reading. 
Fluorescence detection was carried out ina Tecan Infinite M100 Pro 
Microplate reader (Tecan Group, Switzerland) set up according to the 
LanthaScreen Terbium Assay Setup guide available at www.lifetech- 
nologies.com/instrumentsetup. For the first fluorescence reading, 
settings were as follows. Wavelength: excitation 332 nm, bandwidth 
20.0 nm; emission 485 nm; bandwidth 20.0 nm. Flashes: mode 2 (100 Hz 
(20)), settle time O ms. Mode: top. Gain: optimal. Z-position: calculated 
from well (select well with appropriate substrate). Integration: lag time 
100 us, integration time 200 us. After a second fluorescence reading 
was added to the existing protocol, settings were adjusted to those 
described above, except for the following. Wavelength: excitation 
332 nm, bandwidth 20.0 nm; emission 515 nm; bandwidth 20.0 nm. 
Results were expressed as ratios of fluorescence at 520 nm to 485 nm. 


16S amplicon sequencing of intestinal microbiota 

Sample prep, sequencing and OTU clustering were performed by the 
Microbiome Core Lab at Weill Cornell Medicine. DNA extraction from 
caecal contents was carried out with a Maxwell RSC PureFood GMO 
and authentication kit (Promega, AS1600). Preweighted faecal mate- 
rial was deposited into a PowerBead glass 0.1-mm tube (Qiagen, 13118- 
50) and 1 ml of CTAB buffer plus 20 pl of RNase A solution were added. 
Samples were homogenized in buffer by gentle vortexing for 10 s and 
then incubated for 5 min in a ThermoMixer F2.0 (Eppendorf), shaking 
at 1,500r.p.m. Homogenized samples were then vortexed horizontally 
at high -speed for 10 min. Samples were centrifuged at 12,700 r.p.m. 
in an Eppendorf centrifuge for 10 min and extraction proceeded in an 
automated platform. Tubes were transferred to a MaxPrep Liquid Han- 
dler tube rack (Promega) and the Maxwell RSC48 instrument (Promega, 
AS8500) was loaded with proteinase K tubes, lysis buffer, elution buffer, 
pipetting tips, 96-sample deep-well plate and Maxwell RSC 48 sheath 
tips. The instrument was programmed to use 300 ul of sample and trans- 
fer all sample lysate into the Maxwell RSC 48 extraction cartridge for 
DNA extraction. Upon completion, the extraction cartridge was loaded 
into Maxwell RSC 48 for DNA extraction and elution. DNA was eluted in 
100 pl and transferred into 96-well plates. A Quant-iT double-stranded 
(ds)DNA high-sensitivity assay (Thermo Fisher, Q33120) was used for 
DNA quantification. 16S libraries were generated according tothe Earth 
Microbiome Project available at http://press.igsb.anl.gov/earthmicrobi- 
ome/protocols-and-standards/16s/. Briefly, the V4 region was amplified 
using a Hot Start PCR mix (ThermoFisher, 13000014). Primers were 
modified from ref. *° (barcodes were moved to the 515F primer) are as 
follows: 515F forward primer, barcoded (XXXXXXXXXXXX) AATGATAC 
GGCGACCACCGAGATCTACACGCT XXXXXXXXXXXX TATGGTAATT 
GT GTGYCAGCMGCCGCGGTAA; 806R reverse CAAGCAGAAGACGG 
CATACGAGAT AGTCAGCCAG CC. Cycling conditions for 96-well-plate 
thermocyclerswere:3 minat 94 °C; (45sat94 °C; 60sat50 °C; 90sat72 °C) 
x35;10 min at 72 °C; hold at 4 °C. 


PCR products were purified with a MoBio UltraClean PCR clean-up kit 
(catalogue number 12500) according to the manufacturer’s instruc- 
tions. Library concentrations were determined with a Quant-iT dsDNA 
high-sensitivity assay. Verification of library quality and size was per- 
formed using a PerkinElmer LabChip GXII instrument with DNA 1K 
reagent kit (CLS760673). Libraries were normalized and pooled at 2nM 
before sequencing ona MiSeq instrument (Illumina) at a loading con- 
centration of 5.5 pM with 10% Phix, paired-end 250 using MiSeq reagent 
kit v2, 500 cycles (MS-102-2003). Custom sequencing primers were as 
follows: read 1, TATGGTAATT GT GIGYCAGCMGCCGCGGTAA,; index, 
AATGATACGGCGACCACCGAGATCTACACGCT; read 2, AGTCAGCCAG 
CC GGACTACNVGGGTWTCTAAT. Note that the 5’-adaptor sequence/ 
index sequencing primer has an extra GCT at its 3’-end compared with 
Illumina’s usual index primer sequences. These bases were added to 
the 3’-end of the Illumina 5’-adaptor sequence to increase the melting 
temperature for read 1 during sequencing. 

For analysis, demultiplexed raw reads were processed to generate 
an operational taxonomic unit (OTU) table using USEARCH version 
11.0.667 (ref. 2"). Specifically, forward and reverse reads were merged 
using amaximum of five mismatches in the overlap region, a minimum 
sequence identity in the overlap region of 90%, a minimum overlap 
length of 16 base pairs, and aminimum merged sequence length of 300 
base pairs. PhiX contamination was then removed, followed by quality 
filtering based on FASTQ quality scores, with a maximum expected error 
number of 1.0. OTU clustering was performed using usearch -cluster_ 
otus with default settings. Merged (prefilter) reads were mapped tothe 
OTU sequences to generate the OTU table. Taxonomic classification of 
OTU representative sequences was performed using usearch-sintax, 
an implementation of the SINTAX algorithm” using version 16 of the 
Ribosomal Database Project (RDP) training set®. 


Engineering recombinant Bacteroides strains 

The 3a-HSDH Rg 2133 and 3B-HSDH Rg_00694 genes were synthesized 
(GeneWiz) after codon optimization for expression in B. theta. A Golden 
Gate assembly kit (NEB) was used to assemble the following fragments 
andinsert them into plasmid pNBU2 (ref. **) in 5’ to 3’ order: ppWW3806, 
ribosome binding site 8 (RBS), Rg_2133 open reading frame, RBS 8, 
Rg_00694 and pWW3810. The assembled construct was transformed 
into Stellar competent cells (Takara) and positive clones were selected 
and verified by Sanger sequencing. The plasmid was purified and 
retransformed into F. coli S17-1 cells for conjugation into B. theta, 
B.fragand B. ova. Mid-log Bacteroides cultures and plasmid-containing 
E. coliwere mixed 10/1, centrifuged and allowed to sit for 20 min before 
resuspending the pellet in PBS and spreading on brain heart infusion 
(BHI) agar with 10% horse blood containing 25 pg ml erythromycin 
and 200 pg mI gentamycin, followed by incubation in an anaerobic 
chamber (Coy Labs) for 48 h at 37 °C. Individual colonies were screened 
by Sanger sequencing. For more information on bacterial strains and 
plasmids, see Supplementary Table 1. 


Thin-layer chromatography 

Strains were grown for 24 hor 72 hin yeast casitone fatty acids broth 
with carbohydrate (YCFAC) liquid medium (Anaerobe Systems) con- 
taining 100 pM DCA. We added 2g NaCl to 7 ml culture supernatant, 
followed by 700 pl 6M HCI. The supernatant was extracted with 5 ml 
ethyl acetate, dehydrated in 2 g MgSO, and then extracted again 
with ethyl acetate. The organic extract was filtered through 40-um 
nylon filters (Falcon), then dried by vacuum centrifugation in glass 
tubes. The pellet was resuspended in methanol, tubes were washed 
in an equal volume of acetone and the solvent was again removed 
by vacuum centrifugation. The final dried pellet was resuspended 
in 100 pl acetone and was spotted onto glass-backed silica plates 
(Sigma), which were developed in 70/20/2 benzene/1,4-dioxane/acetic 
acid. Plates were stained in 10% w/v CuSO, 8% HPO, and dried over 
a hot plate. 


Article 


Mass spectrometry 

IsoDCA quantification. Bacterial broth was mixed with an equal volume 
of methanol and centrifuged at 21,000g for 20 min. For in vivo quan- 
tification, about 10-30 mg of the dried faecal pellet was extracted in 
150 pl of 50% methanol in water and vortexed for 15 min. The mixture 
was then spun down at 21,000g for 15 min. The supernatant of bac- 
terial broth or extracted faecal pellet was analysed using an Agilent 
1290 LC system coupled to an Agilent 6530 quadripole time-of-flight 
(QTOF) mass spectrometer with a 2.1 pm, 2.1 x 50 mm Zorbax Eclipse 
Plus C18 column (Agilent). Water with 0.05% formic acid (A) and ac- 
etone with 0.05% formic acid (B) was used as the mobile phase at a flow 
rate of 0.40 ml min™ over a 11-min gradient: 0-1 min, 0% B; 1-3 min, 
0-40% B; 3-10 min, 40-100% B; 10-11 min, 100-0% B. Reagents were 
mass-spectrometry grade, purchased from Fisher Scientific. All data 
were collected in negative-ion mode. 


SCFA quantification. Caecal samples (roughly 120 mg) were weighed 
into 2-ml microtubes containing 2.8-mm ceramic beads (Omni Inter- 
national) and resuspended to a final concentration of 100 mg mI? 
using 80/20 methanol/water containing acetate-d3, propionate-d5, 
butyrate-d7 and valerate-d9 as internal standards (Cambridge Isotope 
Laboratories). Homogenization was carried out using a Bead Ruptor 
(Omni International) at 5.4 ms for 3 min at 4 °C. Samples were cen- 
trifuged for 20 min at 20,000g at 4 °C, and 100 ul of caecal extract was 
added to 100 pl of 100 mM borate buffer (pH10). Subsequently, 400 pl 
of 100 mM pentafluorobenzyl bromide (Thermo Scientific) diluted 
in acetonitrile (Fisher) and 400 pl of cyclohexane (Acros Organics) 
were added and reaction vials were sealed. Samples were heated to 
65 °C for 1h with agitation and then cooled to room temperature and 
centrifuged to promote phase separation. We transferred 100 ul of 
the cyclohexane (upper) phase to a new autosampler vial and carried 
out analysis at 1/10 and 1/100 dilutions (using cyclohexane). A calibra- 
tion curve and quality-control samples were prepared in borate buffer 
covering the range 0.1-50 mM. Analysis by gas chromatography-mass 
spectrometry (GC-MS) was using an Agilent 7890A gas chromatograph 
and Agilent 5975C MS detector operating in negative chemical ioniza- 
tion mode. Methane was used as the chemical-ionization reagent gas 
at 2 ml min”, and a1 ul splitless injection was made onto a HP-5MS 
column (30 m x 0.25 mm, 0.25 pm; Agilent Technologies). For SCFA 
quantification, the raw peak areas of acetate (m/z 59) and propionate 
(m/z 73) were normalized to acetate-d3 (m/z 62) and propionate-d5 
(m/z 78) internal standards respectively; the C4 compounds butyrate 
and isobutyrate (m/z 87) were normalized to butyrate-d7 (m/z 94) and 
the CS compounds 2-methylbutyrate, valerate and isovalerate (m/z101) 
were normalized to valerate-d9 (m/z 110). Data analysis was performed 
with Agilent MassHunter quantitative analysis software (version 10.1, 
Agilent Technologies). 


Colonization with engineered consortia 

Frozen stocks of Clostridium scindens (ATCC, catalogue number 35704) 
or engineered Bacteroides strains were streaked on Columbia agar 
plates (BD) and grown at 37 °C inside an anaerobic chamber. Bacteria 
were scraped from agar plates with bacteriological loops into sterile 
anaerobic PBS. Consortia were assembled and transported in airtight 
tubes to the animal facility and administered to experimental animals 
by oral gavage. Mice were housed in flexible PVC isolators (Park Bios- 
ervices) or sentry sealed positive pressure cages (SPP, Allentown) for 
the duration of the experiments. For experiments in SPP cages, animals 
were manipulated with sterile gloves using aseptic techniques inside 
biosafety cabinets. 


Isolation of cells from intestinal lamina propria 
Approximately 12 cm of the distal small intestine and the combined 
caecum and colon were processed as the small intestine and large 


intestine, respectively. After removal of adherent adipose tissue 
and resection of Peyer’s patches, intestines were opened longitu- 
dinally and shaken vigorously in PBS to release contents. Tissues 
were incubated in 25 ml intestinal intraepithelial lymphocyte (IEL) 
solution (1x PBS with 2% FBS (ThermoFisher), 10 mM HEPES buffer 
(ThermoFisher), 1% penicillin/streptomycin (ThermoFisher), 1% 
L-glutamine (ThermoFisher), plus 1mM EDTA (Sigma) and 1 mM dithi- 
othreitol (DTT; Sigma) added immediately before use) for 15 min at 
37 °C with vigorous shaking (250 r.p.m.). Intestines were removed 
from IEL suspension, rinsed in PBS and transferred into 50-ml tubes 
containing 3x one-quarter-inch ceramic beads (MP Biomedicals) and 
25 ml collagenase solution (1x RPMI1640 with 2% FBS (ThermoFisher), 
10 mM HEPES buffer (ThermoFisher), 1% penicillin/streptomycin 
(ThermoFisher), 1% L-glutamine (ThermoFisher), 1 mg ml‘ collagenase 
A (Sigma) and 1U ml DNase I (Sigma)). Following incubation for 
30 min at 37 °C with vigorous shaking (250 r.p.m.), digested lamina 
propria samples were passed through a 100-pm strainer and cen- 
trifuged to remove collagenase solution. The lamina propria frac- 
tions were washed by centrifugation (5 min at 450g) in 44% Percoll 
(ThermoFisher) in PBS to remove debris and excess epithelial cell 
contamination before ex vivo restimulation for analysis of cytokine 
production. Foxp3 expression was assessed by detection of the 
fluorescent GFP reporter protein or intracellular staining using a 
Foxp3/transcription factor staining buffer set (eBioscience). 123count 
eBeads (Invitrogen) were added at 5,000 beads per sample to deter- 
mine cell numbers. Cytometry data were acquired on an LSRII (Becton 
Dickinson, NJ). 


Mice 

Germ-free C57BI/6 mice were purchased from Taconic and main- 
tained in semirigid isolators (Park Bioservices) at Boehringer 
Ingelheim (Ridgefield, CT). Female mice were used in all experi- 
ments to facilitate distribution of animals into experimental 
groups. Germ-free Foxp30™ and Foxp3°” mice were rederived 
as described” and maintained in flexible isolators (Class Biologi- 
cally Clean, Madison, WI) at Weill Cornell Medicine. Male and female 
littermate mice were used in all experiments. Animals were fed with 
autoclaved 5KA1 chow. Germ-free status was routinely checked by 
aerobic and anaerobic cultures of faecal samples for bacteria and 
fungi and by PCR of faecal DNA samples for bacterial 16S and fungal/ 
yeast 18S genes. Germ-free mice were at least eight weeks old at the 
start of experiments. 

Specific-pathogen-free mice were housed at the Research Animal 
Resource Center for MSKCC and Weill Cornell Medicine with 12-hlight/ 
dark cycles under ambient conditions and ad libitum access to food 
and water. Experimental mice were maintained in a standard rodent 
diet (5053, LabDiet). F. Gonzalez (NIH, USA) provided the Nrih4" mouse 
strain; Csf1r’ mice were provided by F. Geissman at MSKCC; CD4 mice 
(Tg(Cd4-cre)1Cwi) were purchased from Jax laboratories and maintained 
in house. Experimental littermate animals were generated by mating 
mice homozygous for the Nrih4“ allele, with one of the breeders (male 
or female) carrying one copy of the Cre-driver gene. Cells from male 
and female CD4” Nr1h4" or Csfir" Nr1h4" mice were used for in vitro 
experiments. Male CsfIr“” Nr1h4" mice were analysed at six to eight 
weeks of age. 

All studies were carried out under protocol 08-10-023 and approved 
by the Sloan Kettering Institute Institutional Animal Care and Use Com- 
mittee. Germ-free mice housed at Boehringer Ingelheim were main- 
tained under a protocol approved by the Institutional Animal Care and 
Use Committee of Boehringer Ingelheim Pharmaceuticals. All animals 
used here had no previous history of experimentation and were naive 
at the time of analysis. 


Statistical analyses 
Statistical tests were performed with GraphPad Prism version 7.0. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


RNA-sequencing and 16S amplicon sequencing data are available under 
BioProject (https://www.ncbi.nIm.nih.gov/bioproject/) identification 
codes PRJNA600898 and PRJNA600979. Source data for Figs. 1-4 and 
Extended Data Figs. 1-8 are available as .xsI tables with the paper. Other 
relevant data are available from the corresponding authors upon rea- 
sonable request. 
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Extended Data Fig. 1| Effects ofiso- and oxo-bile acids on T cell 
differentiation and proliferation. a, Effects of T,,, cell-inducing bile acids on 
thein vitro generation of T,,17 cells. Naive CD4*T cells were activated by DCs in 
T,17-polarizing conditions (2ng ml" TGF-B, 1 pg ml7 CD3 antibody and 

20ng mI IL-6). On day 3, cocultures were restimulated with phorbol myristate 
(PMA) and ionomycin in the presence of brefeldin A and monensin for 3h 
before FACS analysis of IL-17 production. b, The 6B-OH group of w-MCA is 
required for its T,., cell-inducing activity. Naive CD4*T cells were activated by 
DCs in suboptimal T,,, cell-inducing conditions (Ing mI" TGF-B, 1 pg ml"! CD3 
antibody and100U mI" IL-2) and exposed to w-MCA or 6-oxoMCAat the 
indicated concentrations. Foxp3 induction was assessed by FACS on day 3. 
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c, Assessment of cell division in the presence of isoDCA and 3-oxoDCA 

(100 uM). Naive CD4' T cells were labelled with Cell Trace Violet and activated 
with CD3/CD28 antibody-coated beads in the presence of TGF-f and IL-2 for 
three days before FACS analysis. Data shown are means +s.d. of replicates 
(a-c, n= 3). Statistical significance determined by one-way (a, b) or two-way (c) 
ANOVA followed by a Dunnet’s (a) or Tukey’s (b, c) multiple comparison test. 
*P<0.05,**P<0.01,***P< 0.001 versus vehicle; hash symbol, P< 0.05 vs w-MCA 
(paired concentration); plus symbol, P< 0.05 versus isoDCA (paired 
concentration); ns, not significant. Data are representative of at least two 
independent experiments. 
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Extended Data Fig. 2 | Characterization of mice with FXR deficiency inthe 
myeloid compartment. a-e, WT (CsfIr" Nrih4™) and DC**8 (Csfr'" Nrih4") 
littermate mice were analysed between six to eight weeks of age. a, Gating 
strategy andb, quantification of conventional (c)DC1s (live CD45* Lin’ (dump: 
CD90, CD3; CD64, Ly6C’, Siglec-F-) CD11c* MHC class I" CD11b" XCRI1') and 
cDC2 (live CD45* Lin” (dump: CD90, CD3; CD64, Ly6C,, Siglec-F-) CD11c* MHC 


class II" CD11b* XCR1) in the spleen (Spl), mesenteric lymph node (MLN) and 
LILP. c, Gating strategy for d, e.d, Number of total Foxp3’T,,, cells inthe 
indicated organs. e, Quantification of RORyt* Foxp3’T,,, cells in the LILP. Data 
shown are means ¢+s.d. (n=5), representative of two independent cohorts of 
mice. Statistical significance determined by atwo-tailed t-test. **P< 0.01. 
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Extended Data Fig. 3 | Anti-inflammatory effects ofisoDCA treatment on 
DCs. a, DCs (1x 10°) were stimulated for 18 h with various TLR agonists (x-axes) 
in the presence or absence of 50 tMisoDCA. Levels of the indicated cytokines 
(TNF-a and IL-6) in the culture supernatant were determined by ELISA. b, DCs 
(1 10*) were pulsed with ovalbumin (OVA, 1 mg mI) in the presence of various 
concentrations of isoDCA for 1hin serum-free medium and allowed to process 
antigen for 4hin complete medium before addition of an NFAT-GFP reporter 
cell line expressing the MHC-II-restricted OT-II TCR recognizing the 
ISQAVHAAHAEINEAGR peptide of OVA. The frequency of GFP” cells was 
determined by FACS analysis after 24 h. Cocultures treated with CD3 antibody 
(1pg mI") served as controls for DC-dependent, antigen-processing- 


IL-6 
20005 
freed 
1500+ 
jO| 
E 
= 1000, 
2 wee 
500. eke seek 
ole tle ‘h te [if i Se || 
* Ss eS Ss RS Se RS 
S 2 g = 
cS wer ee” oo 
g & 


HE DC+OVA (1mg/mL) 
DC+aCD3 (1yg/mL) 
aCD3/CD28 beads 


100 [isoDCA] uM 


independent effects of isoDCA on the activation of reporter cells. Activation 
with CD3/CD28 antibody beads in the presence of isoDCA served as acontrol 
for DC-independent effects on reporter gene expression. Shown are 

means +s.d. of replicates in a and fold-change relative to vehicle (0 uMisoDCA) 
within each condition (OVA, CD3 or CD3/CD28 antibody-coated beads) inb. 
Statistical significance in a was determined by multiple t-tests using the Holm- 
Sidak correction method with a=0.05S. ****P< 0.001 versus vehicle. Statistical 
significance in b was determined by atwo-way ANOVA followed by Dunnet’s 
multiple comparison’s test. *P< 0.05; ****P< 0.001 versus vehicle in each 
condition. Data are representative of three independent experiments. 
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Extended Data Fig. 4| Liquid chromatography-mass spectrometry 
(LC-MS)-based analysis of isoDCA production by engineered B. theta 
strains. Bacteria were grown to exponential phase and transferred to media 
containing DCA. Following incubation for 24 h, media was extracted with 
methanol and supernatants were analysed by liquid chromatography-mass 
spectrometry (LC-MS). Shown are traces for spike-in controls with DCA and 
isoDCA standards, and for media conditioned by B. theta", B. theta®® or the 
parental, unmanipulated B. theta’" strain VPI-5482. Data are representative of 
two independent experiments carried out in triplicate. 
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Extended Data Fig. 5| Analyses of microbial community compositionin 
gnotobiotic and conventionalized mice. GF mice were gavaged with WT or 


CD engineered consortia (C. scindens plus B. theta’ 


eWT 


or C.scindens plus 


B. theta*“”), Recipients of an FMT or noncolonized mice (PBS) served as 
references. The OTUcomposition of the caecal microbiota on day 10 
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post-colonization was determined by 16S sequencing. Shownare total read 
counts (left) and relative abundances (right) of bacteria in individual 
experimental mice, with data pooled from two independent experiments 


(n=10). 
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Extended Data Fig. 6| Effects of isoDCA-producing consortia on colonic 
lymphocytes. GF mice were gavaged with engineered consortia (C. scindens 
plus B. theta" or C. scindens plus B. theta*), PBS or acomplex microbial 


community (FMT) asin Fig. 4b. a-g, Immune cell composition in the LILP was 


analysed by FACS on day 10 (D10; a-e) or day 30 (f, g) post-colonization. 
a, b, Frequencies of total Foxp3* (a) and RORyt* Foxp3* (b) T,,, cells among 
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CD45* cells. c,d, Frequency of RORyt* Foxp3* cells in the MLN (c) and small 
intestine lamina propria (SILP, d). e, Frequency of RORyt* cells among Foxp3" 
CD4* T cells (e, f) and Foxp3* CD4* T cells (g). Data shown are means +s.d. 
(n=10), pooled from two independent experiments. Statistical significance 
determined by one-way ANOVA followed by Tukey’s multiple comparison’s test. 
**P<0.01;***P< 0.001; ****P< 0.0001; ns, not significant. 
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Extended Data Fig. 7 | IsoDCA production by engineered Bacteroides sp. 
strains. a, Quantification of isoDCA production by engineered and reference 
strains in vitro. Bacteria were grown to exponential phase and transferred to 
medium containing DCA. Following incubation for 24 h, medium was extracted 
with methanol and supernatants were analysed by LC-MS. AUC, area under 
curve. b,c, GF mice were colonized with consortia containing either the 
engineered strain of B. frag capable of producing isoDCA or the catalytically 
dead mutant in combination with C. scindens (C. scindens plus B. frag" 

and C. scindens plus B. frag*, respectively). Recipients of an FMT and 
noncolonized mice (PBS) served as references. Immune-cell composition and 
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isoDCA quantification were performed 10 days post-colonization. b, FACS 
analysis of the frequency of RORyt’ Foxp3* CD4' T cells in the LILP. 

c, Quantification of isoDCA in caecal contents. Faecal material was weighed, 
homogenized and extracted with methanol for LC-MS analysis. Ina, c, the AUC 
is normalized by the weight of the input material. Shown are means +s.d. 
(a,n=3;b,n=10;c,n=5).Dataina, care representative of two independent 
experiments. Data in bare pooled from two independent experiments. 
Statistical significance determined by a one-way ANOVA followed by Tukey’s 
multiple comparison’s test. *P< 0.05; ND, not detected; ns, not significant. 
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Extended Data Fig. 8 | SCFA production by minimal, defined microbial 
consortia. GF mice were colonized with consortia containing either the 
engineered strain of B. frag capable of producing isoDCA or the catalytically 
dead mutant in combination with C. scindens (C. scindens plus B. frag" and 
C. scindens plus B. frag*, respectively). Recipients of an FMT and 
noncolonized mice (PBS) served as references. Caecal content material was 


weighed, homogenized and subjected to organic solvent extraction for 
GC-MS-based quantification of SCFA levels. Shown are means +s.d. (n= 6), 
with data pooled from two independent experiments. Statistical significance 
determined by a one-way ANOVA followed by Tukey’s multiple comparison’s 
test. ****P< 0.0001; ns, not significant. 
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Statistics 


For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section. 


n/a | Confirmed 


The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


A description of all covariates tested 


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


O A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) 
AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) 


For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted 
“—! Give P values as exact values whenever suitable. 


For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 


For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 


Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


Our web collection on statistics for biologists contains articles on many of the points above. 


Software and code 


Policy information about availability of computer code 


Data collection FACS Diva v 8.0 (BD), MiSeq Control Software v 2.6.2.1 (Illumina), MiSeq Real Time Analysis v 1.18.54 (Illumina), bcl2fastq v 2.17.1.14 
(IIlumina), Protein Thermal Shift v 1.3 (Applied Biosystems), 


Data analysis RNA-seq data was processed and analyzed using an in-house pipeline, available upon request: STAR v 2.4.2, 16S sequencing data was 
analyzed by the Microbiome Core Facility at Weill Cornell Medical College, pipeline available upon request: USEARCH v 11.0.667, 
Ribosome Database Project 16S training set v 16, Phyloseq v 1.30.0, FlowJo v 10.6.1, Graphpad Prism v 7.0d, MassHunter v10.1 (Agilent) 


For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. 
We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information. 


Data 


Policy information about availability of data 


All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 
- Accession codes, unique identifiers, or web links for publicly available datasets 
- A list of figures that have associated raw data 
- Adescription of any restrictions on data availability 


RNA-seq data was submitted to NCBI under BioProject Accession #: PRJNA600898 
16S sequencing data was submitted to NCBI under Bioproject accession #: PRJNA600979 
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Field-specific reporting 


Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection. 


x Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences 


For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf 


Life sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Sample size No statistical methods were used to predetermine sample size. Sample sizes were determined by magnitude and consistency of measurable 
differences. Animal numbers used are given in the figure legends. 
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Data exclusions No data were excluded. 


Replication Experiments are representative of 2-3 replicates with similar results, or are data pooled from 2-3 replicates as described in the figure legends. 
Attempts at replication were successful. 


Randomization Mice were randomly assigned to experimental groups at the beginning of treatments. 


Blinding Investigators were not blinded during group allocation and data analysis. Obvious morphological differences in the cecum of germ-free versus 
colonized mice make blinding difficult. 


Reporting for specific materials, systems and methods 


We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, 
system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. 


Materials & experimental systems Methods 
n/a | Involved in the study n/a | Involved in the study 
Antibodies ChIP-seq 
Eukaryotic cell lines Flow cytometry 
Palaeontology MRI-based neuroimaging 


Animals and other organisms 


Human research participants 


Clinical data 


Antibodies 


Antibodies used Rat monoclonal anti-mouse CD19 (BUV395) BD Biosciences Cat#563557; RRID: AB_2722495; Clone: 1D3; Dilution 1:500 

Rat monoclonal anti-mouse CD45R (BUV496) BD Biosciences Cat#564662; RRID: AB_2722578; Clone: RA3-6B2; Dilution 1:500 
Rat monoclonal anti-mouse CD8a (BUV737) BD Biosciences Cat#564297; RRID: AB_2722580; Clone: 53-6.7; Dilution 1:800 

Rat monoclonal anti-mouse Siglec-F (BV421) BD Biosciences Cat#562681; RRID: AB_2722581; Clone: E50-2440; Dilution 1:500 
Rat monoclonal anti-mouse CD45 (BV570) BioLegend Cat#103136; RRID: AB_2562612; Clone: 30-F11; Dilution 1:600 

Rat monocloncal anti-mouse/human CD11b (BV605) BioLegend Cat#101237; RRID: AB_11126744; Clone: M1/70; Dilution 1:600 
ouse monoclonal anti-mouse, rat XCR1 (BV650) BioLegend Cat#148220; RRID: AB_2566410; Clone: ZET; Dilution 1:200 

Rat monoclonal anti-mouse Ly-6C (BV711) BioLegend Cat#128037; RRID: AB_2562630; Clone: HK1.4; Dilution 1:800 

Armenian Hamster monoclonal anti-mouse CD11c (FITC) BD Biosciences Cat#553801;RRID:AB_395060; Clone HL3; Dilution 1:400 
Armenian Hamster monoclonal anti-mouse CD103 (PerCP/CyanineS.5) BioLlegend Cat#121416; RRID: AB_2128621; Clone 2E7; 
Dilution 1:300 
Armenian Hamster monoclonal anti-mouse FceRla (PE) ThermoFisher Cat#12-5898; RRID: AB_466027; Clone: MAR-1; Dilution 
1:500 

Rat monoclonal anti-mouse F4|80 (PE-eFluor610) ThermoFisher Cat#61-4801; RRID: AB_2574612; Clone: BM8; Dilution 1:400 
Rat monoclonal anti-mouse Ly-6G/Ly-6C (PE-Cy7) ThermoFisher Cat#25-5931-81; RRID: AB_469662; Clone: RB6-8C5; Dilution 
1:400 

Rat monoclonal anti-mouse CD117 (APC) BD Biosciences Cat#553356; RRID: AB_398536; Clone: 2B8; Dilution 1:200 

Rat monoclonal anti-mouse MHC-II (redfluor710) Tonbo Biosciences Cat#80-5321; RRID: AB_2621997; Clone: M5/114.15.2; 
Dilution 1:800 

Rat monoclonal anti-mouse CD90.2 (BV785) BioLegend Cat#105331; RRID: AB_2562900; Clone: 30-H12; Dilution 1:400 


Rat monoclonal anti-mouse EpCAM (PE-Cy7) BioLegend Cat#118216; RRID: AB_1236471; Clone: G8.8; Dilution 1:400 
Rat monoclonal anti-mouse CD25 (BUV395) BD Biosciences Cat#564022; RRID: AB_2722574; Clone: PC61; Dilution 1:400 
Rat monoclonal anti-mouse CD4 (BUV496) BD Biosciences Cat#564667; RRID: AB_2722549; Clone: GK1.5; Dilution 1:500 
Rat monoclonal anti-mouse CD62L (BV605) BioLegend Cat#104438; RRID: AB_2563058; Clone: MEL-14; Dilution 1:600 
Armenian Hamster monoclonal anti-mouse Helios (Pacific Blue) BioLlegend Cat#137210; RRID: AB_10575625; Clone: 22F6; 
Dilution 1:200 
Rat monoclonal anti-mouse CD45 (BV570) BioLegend Cat#103135; RRID: AB_10898325; Clone: 30-F11; Dilution 1:1000 
Rat monoclonal anti-mouse CD44 (BV650) BioLegend Cat#:103049; RRID:AB_2562600; Clone: IM7; Dilution 1:400 
Rat monoclonal anti-mouse CD8a (BV711) BioLegend Cat#100759; RRID: AB_2563510; Clone: 53-6.7; Dilution 1:500 
Rat monoclonal anti-mouse CD90.2 (BV785) BioLegend Cat#105331; RRID: AB_2562900; Clone: 30-H12; Dilution 1:800 
Rat monoclonal anti-mouse FoxP3 (FITC) eBiosciences Cat#:11-5773-82; RRID:AB_465243; Clone: FJK-16s; Dilution 1:400 
Armenian Hamster monoclonal anti-mouse TCR-beta (PerCP/Cy5.5) BioLegend Cat#:109227; RRID: AB_1575176; Clone: 
H57-597; Dilution 1:400 
Rat monoclonal anti-mouse GITR (PE) eBiosciences Cat#12-5874-82; RRID:AB_465986; Clone:DTA-1; Dilution 1:500 

ouse monoclonal anti-mouse, RORgt (PE-CF549) BD Biosciences Cat#562684; RRID:AB_2651150; Clone:Q31-378; Dilution 
1:400 
Rat monoclonal anti-mouse Ep-CAM (PE/Cy7) BioLegend Cat#118216; RRID:AB_1236471; Clone:G8.8; Dilution 1:1000 
Rat monoclonal anti-mouse KI-67 (AF700) BioLegend Cat#:652420; RRID: AB_2564285; Clone: 16A8; Dilution 1:600 
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Validation Antibodies were not validated. 


Eukaryotic cell lines 


Policy information about cell lines 


Cell line source(s) HT-29 ATCC, B16-FIt3 provided by Glenn Dranoff (Dana Farber Cancer Institute) 
Authentication Cell lines were not authenticated. 
Mycoplasma contamination Cell lines were not tested for mycoplasma. 


Commonly misidentified lines Cell lines were not identified in ICLAC. 
(See ICLAC register) 


Animals and other organisms 


Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research 


Laboratory animals Germ Free C57BI/6 mice were purchased from Taconic. Female mice were used in all experiments to facilitate distribution of 
animals into experimental groups. Germ Free Foxp3GFPACNS1 and Foxp3GFP mice were bred and maintained at Weill Cornell 
Medical College Gnotobiotics facility. Male and female littermate mice were used in all experiments. Animals were fed with 
autoclaved 5KA1 chow. Germ Free status was routinely checked by aerobic and anaerobic cultures of fecal samples for bacteria 
and fungi and by PCR of fecal DNA samples for bacterial 16S and fungal/yeast 18S genes. GF and ex-GF mice were at least 8 
weeks old at the initiation of experiments. 


SPF mice were housed at the Research Animal Resource Center for Memorial Sloan Kettering Cancer Center and Weill Cornell 
Medical College with 12-hour light/dark cycles under ambient conditions and ad libitum access to food and water. Experimental 
mice were maintained in a standard rodent diet (5053, LabDiet). Dr. Frank Gonzalez (NIH, USA) kindly provided the Nrih4floxed 
mouse strain. CsfirCre mice were provided by Dr. Frederic Geissman at MSKCC, USA. CD4Cre mice [Tg(Cd4-cre)1Cwi] were 
purchased from Jax laboratories and maintained in house. Experimental littermate animals were generated by mating mice 
homozygous for the Nrih4floxed allele, with one of the breeders (male of female) carrying one copy of the Cre-driver gene. 
Cells from male and female CD4CreNr1h4floxed or CsfirCreNrih4floxed were used for in vitro experiments. Male 
CsfirCreNrih4floxed mice were analyzed at 6-8 weeks of age. 


Wild animals This study did not use wild animals. 

Field-collected samples This study did not use field-collected samples. 

Ethics oversight Studies were approved by the IACUCs of Memorial Sloan Kettering Cancer Center and Boehringer Ingelheim Pharmaceuticals, 
Inc. 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 
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Methodology 
Sample preparation This information is included in the methods. 
nstrument Flow Cytometry: LSR II (BD Biosciences), Cell Sorting: Aria Il (BD Biosciences) 
Software FACS Diva v8.0 (BD Biosciences), FlowJo v10.6.1 


Cell population abundance Purification by sorting was performed using FACSAria Il cell sorter (BD Biosciences), with > 98% Purity 


Gating strategy Gating strategy is described in the methods section. 


Tick this box to confirm that a figure exemplifying the gating strategy is provided in the Supplementary Information. 
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Most proteins associate into multimeric complexes with specific architectures’”, 
which often have functional properties such as cooperative ligand binding or 


allosteric regulation®. No detailed knowledge is available about how any multimer and 
its functions arose during evolution. Here we use ancestral protein reconstruction 
and biophysical assays to elucidate the origins of vertebrate haemoglobin, a 
heterotetramer of paralogous a- and B-subunits that mediates respiratory oxygen 
transport and exchange by cooperatively binding oxygen with moderate affinity. We 
show that modern haemoglobin evolved from an ancient monomer and characterize 
the historical ‘missing link’ through which the modern tetramer evolved—a 
noncooperative homodimer with high oxygen affinity that existed before the gene 
duplication that generated distinct a- and B-subunits. Reintroducing just two 
post-duplication historical substitutions into the ancestral protein is sufficient to 
cause strong tetramerization by creating favourable contacts with more ancient 
residues on the opposing subunit. These surface substitutions markedly reduce 
oxygen affinity and even confer cooperativity, because an ancient linkage between the 
oxygen binding site and the multimerization interface was already an intrinsic feature 
of the protein’s structure. Our findings establish that evolution can produce new 
complex molecular structures and functions via simple genetic mechanisms that 
recruit existing biophysical features into higher-level architectures. 


The interfaces that hold molecular complexes together typically involve 
sterically tight, electrostatically complementary interactions among 
many amino acids‘. Similarly, allostery and cooperativity usually 
depend on numerous residues that connect surfaces to active sites>. 
The acquisition of such complicated machinery would seem to require 
elaborate evolutionary pathways. The classical explanation of this 
process, by analogy to the evolution of morphological complexity, 
is that multimerization conferred or enhanced beneficial functions, 
allowing selection to drive the many substitutions required to build 
and optimize new interfaces**. 

Whether this account accurately describes the evolution of any 
natural molecular complex requires a detailed reconstruction of the 
historical steps by which it evolved. Haemoglobin (Hb) is a useful model 
for this purpose, because the structural mechanisms that mediate 
its multimeric assembly, cooperative oxygen binding, and allosteric 
regulation are well established’*. Moreover, its subunits descend by 
duplication and divergence from the same ancestral proteins, so their 
history can be reconstructed ina single analysis. Despite consider- 
able speculation’ “, virtually nothing is known about the evolutionary 
origin of Hb’s heterotetrameric architecture and the functions that 
depend onit. 


From monomer to homodimer 


We inferred the phylogeny of Hb and closely related globins (Fig. 1a, 
Extended Data Fig. 1a, b, e). The duplication that produced the distinct 
Hboa and Hbf subunits occurred before the last common ancestor of 
jawed vertebrates (Fig. 1a). The closest outgroups—myoglobin (Mb)”, 
globin E®, and globin Y (Extended Data Fig. 1d)—are monomers. Amore 
distant clade of agnathan ‘haemoglobin’ and vertebrate cytoglobin 
includes monomers and dimers*”’, but the dimers assemble through 
interfaces that differ from each other and from those used in Hb, indi- 
cating parallel acquisition’*”. These observations suggest that the 
Hb a,f, heterotetramer evolved from an ancestral monomer via an 
unknown intermediate form. 

To characterize when and how the tetramer evolved, we first recon- 
structed Hb of the ancestral jawed vertebrate by phylogenetically 
inferring the maximum a posteriori sequences of the ancestral a- and 
B-subunits (Anca and Ancf; Fig. 1a, Extended Data Fig. 1b, c). We coex- 
pressed and purified Anca and Ancf and characterized their assembly 
using native mass spectrometry (nMS), size-exclusion chromatography 
(SEC) and multi-angle light scattering (MALS). Like extant Hb, Anca 
and Ancf associate into a8, heterotetramers, witha tetramer—dimer 
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Fig. 1| Structure and function of ancestral globins. a, Simplified phylogeny 
of vertebrate globins. Icons, oligomeric states. *Approximate likelihood ratio 
statistic >10. Complete phylogeny in Extended Data Fig. 1a. Circles, 
reconstructed ancestral proteins. Scale bar, substitutions per site. b, nMS 
spectra of Anca/B (top, purple) and Anca + Ancf (lower, pink and blue) at 

20 uM. Charge states, stoichiometries, and occupancy (fraction of moles of 
subunits) shown. Red, analysed by tandem mass spectrometry (MS/MS) in 
Extended Data Fig. 2e.c, Dimer-to-tetramer affinity of Anca + Anc (red) and 
human Hb (green). Circles, fraction of a+ B heterodimers incorporated into 
a,f, tetramers, measured once by nMS. K,+s.e. (in moles of subunits) 
estimated by nonlinear regression. d, e, Oxygen affinity (P50) and 
cooperativity (Hill coefficient, n) of Anca/B and Anca + Ancf. IHP, twofold 
molar excess of inositol hexaphosphate. Mean + 95% confidence interval (Cl) 
from 3-5 replicates (dots) shown. #, P50 significantly different from Anca/B 
under corresponding conditions (P< 0.01, t-test). *Significant cooperativity 
(n#1,P<0.05S, F-test; Extended Data Fig. If). 


dissociation constant (K,) of 10 uM, comparable to that of human Hb 
(15 pM; Fig. 1b, c, Extended Data Fig. 2a-—c, f, i). Expressed in isolation, 
Anca forms homodimers (Extended Data Fig. 3a), and Ancf forms 
homotetramers (Extended Data Fig. 3b), just as extant Hb subunits 
do'®’, The heterotetrameric structure of Hb therefore evolved before 
the jawed vertebrate ancestor, more than 400 million years ago. 

By contrast, Anca/B, the pre-duplication ancestral protein, homodi- 
merizes with aK, of 9 uM measured by nMS, but does not form tetram- 
ers (Fig. 1b, Extended Data Fig. 2d, f, g). Even at 1.4 mM, no tetramers 
are detectable using SEC (Extended Data Fig. 2h). Anca/B was therefore 
a homodimer, with virtually no propensity to tetramerize. This result 
is robust even when we incorporate statistical uncertainty about the 
ancestral sequence in several alternative constructs (Extended Data 
Fig. 4). This is also the most parsimonious history, because extant 
Hba dimerizes and Hbf tetramerizes when they are expressed in iso- 
lation’*"’: a monomeric Anca/B would imply independent gains of 
dimerization, and a tetramer would require early gain of tetrameriza- 
tion followed by loss in Hba (Extended Data Fig. 4). 

AncMH, the common ancestor of Hb and myoglobin, is mono- 
meric. No higher-order stoichiometries were detected using nMS 
of His-tagged AncMH at 70 uM (Extended Data Fig. 3f). Even at 600 
uM, only monomers were apparent using SEC (Extended Data Fig. 2)). 
The untagged protein also does not dimerize at concentrations at 
which Anca/ is predominantly dimeric, as shown using SEC anda 
globin-specific concentration assay on lysate from transformed cells 
(Extended Data Fig. 3d, e). Amonomeric AncMH is also the most par- 
simonious scenario, because its closest outgroups are all monomers 
(Extended Data Fig. 4b-e). 


The Anca/B homodimer is therefore the evolutionary missing link 
between an ancient monomer and the Hb heterotetramer. After dupli- 
cation, anovel interaction evolved, enabling these dimers to associate 
into tetramers. 


Evolution of Hb functions 


We characterized the evolution of the functional properties of Hb by 
assaying the oxygen-binding characteristics of the ancestral proteins. 
The physiological role of modern Hb—loading oxygen in the lungs or 
gills and unloading it in the periphery—is possible because Hb binds 
and releases oxygen cooperatively and has an affinity lower than that 
of myoglobin; its affinity is further reduced by allosteric effectors®. 
Like human Hb, the coexpressed and copurified complex Anca + AncB 
displays measurable cooperativity, and its oxygen affinity is similar to 
that of stripped, recombinant human Hb” (Fig. 1d, e). The affinity of 
Anca+ Ancf is reduced in the presence of the allosteric effector inositol 
hexaphosphate (IHP), although by less than that of human Hb”°. The 
functional characteristics of extant Hb were therefore in place by the 
jawed vertebrate ancestor. 

By contrast, the oxygen affinity of Anca/® is significantly higher than 
that of Anca + Ancf§, and it does not display detectable cooperativity 
or allosteric regulation by IHP (Fig. 1d, e, Supplementary Discussion). 
The major functional characteristics of modern Hb therefore evolved 
between Anca/B and Anca + Ancf, the same interval during which 
tetramerization evolved. This also represents the most parsimonious 
history: Hb tetramers are cooperative, but Hha homodimers and HbB 
homotetramers are not®!, suggesting that this property did not yet 
exist in their common ancestor (Extended Data Fig. 4). 

Because Anca/B lacked cooperativity, allostery, or reduced affin- 
ity, it could not have performed modern Hb’s physiological role in 
oxygen exchange. Furthermore, the first step in the evolution of Hb’s 
tetrameric architecture—acquisition of homodimerization froma 
monomeric ancestor—could not have been driven by selection for 
the major functional properties of Hb, because the homodimer did 
not possess any of them. 


Ancestral and derived interfaces 


Hb assembles via two distinct interfaces on each subunit: IF1 mediates 
1-61 and «2-B2 contacts, while IF2 mediates a1-B2 and a2-B1 con- 
tacts’ (Fig. 2a). To determine which interface evolved before Anca/B, we 
applied hydrogen-deuterium exchange mass spectrometry (HDX-MS) 
to Anca/f. We compared patterns of deuterium uptake at high and low 
protein concentrations (at which dimers or monomers predominate, 
respectively; Extended Data Fig. 2d, f, g). Solvent-exposed residues 
incorporate deuterium faster than buried residues, so peptides that 
contribute to the dimer interface should exhibit higher deuterium 
uptake when the monomeric state predominates. We found that Anca/B 
peptides containing residues in IF1 incorporate significantly more deu- 
terium under monomer-favouring than dimer-favouring conditions; 
no difference was observed for IF2 (Fig. 2b, c, Extended Data Figs. 5-8). 
Moreover, mutations in residues in IF1 substantially impair dimerization 
of Anca/B, but a mutation that disrupts IF2 in human Hb” has no effect 
(Fig. 2d, Extended Data Figs. 7c, 9). Reverting all IF1 residues in Anca/B 
tothe amino acid state from AncMH yields predominantly monomers, 
but reverting those at IF2 has no effect (Fig. 2d, Extended Data Fig. 7d). 

Anca/B homodimers therefore assembled via IF1. After duplica- 
tion, IF2 evolved, enabling dimers to assemble into tetramers (Fig. 2e). 
Corroborating this inference, extant Hba homodimers assemble via 
IF1, whereas Hbf tetramers use both IF1 and IF2, indicating that IF1 
was inherited from their ancestor Anca/B'*””. The finding that IF2 
evolved after the gene duplication explains why Anca/ is neither coop- 
erative nor allosterically regulated, because both functions require 
IF2-mediated assembly into tetramers”. 
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Fig. 2 | Identification of homodimerization interface in Anca/B.a, Hb 
heterotetramers assemble via two interfaces (IF1, orange; IF2, yellow) oneach 
subunit. Red and pink surfaces, a-subunits; blue cartoon, B-subunits. 

Anca+ AncB homology model is shown. b, Deuterium incorporation by an 
Anca/B peptide that contributes to IF1 (Extended Data Fig. 5g, h). Uptake 
(mean +s.e. from three replicates per incubation time) is shown for Anca/B 
(black) and monomeric IF1 mutant P127R (green). c, Each circle shows mean 
difference in deuterium uptake by one Anca/B peptide when expressed at 
monomer-favouring versus dimer-favouring concentrations (0.67 and 75 uM, 
respectively; three replicates each, withs.e.). Peptides are classified by the 
interface to which they contribute and coloured by incubation time. *Mean 
uptake in interface category significantly different from other categories 
(P<0.05, permutation test, Extended Data Figs. 6g, 7).d, Dimer and monomer 
occupancy by Anca/f and mutants, assessed using nMS at 20 pM. P127R and 
Q4OR disrupt IF1land IF2, respectively. IFlrev and IF2rev revert historical 
substitutions to state in AncMH (spectra in Extended Data Fig. 7c, d). 

e, Evolution of Hb tetramer. Rectangles, acquisition of IFlandIF2.C, 
cooperative; NC, noncooperative; Mb, myoglobin. 


Genetic mechanisms for the new interface 


The causal substitutions for the evolution of heterotetramers from the 
homodimer must have occurred on one or both of the post-duplication 
branches that lead from Anca/B to Anca and AncB. On the Anca branch, 
there were only three changes, of which none were at IF2. On the AncB 
branch, there were 42 changes, including 5 at IF2 and 4 others at IF1 
(Fig. 3a). 

We introduced the IF2 substitutions into Anca/B (Anca/B5) and found 
that they confer strong assembly into tetramers when Anca/B5 is coex- 
pressed with Anca; the mixture includes both heterotetramers and 
homotetramers (Fig. 3b, Extended Data Fig. 10c, d). A version contain- 
ing only four of the substitutions (Anca/B4) formed homotetramers at 
20 uM but did not heteromerize with Anca. The fifth change (H104E) 
therefore confers the capacity to associate with Anca, probably because 
it interacts with His104 on Anca, forming a hydrogen bond in the het- 
eromer but clashing in the homomer (Fig. 3b, Extended Data Fig. 10a, 
b). Evenasubset of just two IF2 changes (Anca/B2) causes high-affinity 
assembly into homotetramers (K, = 1M; Fig. 3b, d, Extended Data 
Fig. 10g). The genetic basis for the evolution of a new strong interface 
was therefore simple. 

The IF2 substitutions are not sufficient to yield specific occupancy 
of the a,B, architecture: coexpressing Anca/B5 with Anca produces a 
mixture of tetramers containing zero, one, or two a-subunits (Fig. 3b, 
Extended Data Fig. 10c, d). We hypothesized that IF1 substitutions 
conferred heterospecificity by favouring the assembly of heterodimers 
across IF1, which then form a,f, heterotetramers across IF2. We intro- 
duced the IF1 substitutions into Anca/B5 (Anca/B9) and coexpressed 
it with Anca. As predicted, heterotetramers and heterodimers pre- 
dominate over homomers (Fig. 3c). Anca/B9 + Ancais poorly soluble, 
preventing quantification by nMS, but the addition of five historical 
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Fig. 3 | Genetic mechanisms of tetramer evolution. a, Homology model of 
Anca+ Ancf tetramer with interface residues substituted between Anca/B and 
AncB. Grey surfaces, two Anca subunits; yellow, IF2; orange, IF1. Blue cartoon, 
partial backbone of one Ancf subunit; sticks, side chains of substituted sites 
(cyan, IF2; green, IF1). Labels show state in Anca/B (lowercase) and AncB 
(uppercase). *Sites in Anca/B2; underlined, sites in Anca/B4. b, Phylogenetic 
interval between Anca/B and Anca + AncB with number of substitutions and 
deletions per branch. Venn diagrams, sites substituted at interfaces. Below, 
substitutions incorporated in mutant proteins. c, Occupancy of multimers 
measured by nMS at 20 pM, as fraction of moles of subunits in each state. 
Anca/B2 was expressed in isolation, so only homomersare plotted. Spectrain 
Extended Data Fig. 10. d, SEC of Anca/B9 + Anca at 80 uM. Lines, elution 
volumes of tetramer (Anca + Anc), dimer (Anca/B), monomer (human Mb). Pie 
chart, proportions of Anca and Anca/B9 subunits in tetramer-containing 
fraction, measured by denaturing MS (Extended Data Fig. lle). Top, 
electrophoresis of tetramer-containing fraction. e, Dimer-to-tetramer affinity 
of Anca/B2 (blue) and Anca/B14 + Anca (orange). Orange circles, fraction of 
Anca/B14 + Anca heterodimers incorporated into heterotetramers; blue, 
fraction of Anca/B2 homodimers in homotetramers, measured by nMS once. 
Ky+s.e. estimated by nonlinear regression. 


substitutions at sites proximal to the interfaces (Anca/B14 + Anca) 
improves solubility, and nMS confirmed preferential occupancy of 
af, heterotetramers (K,= 6 EM; Fig. 3b, d, Extended Data Fig. 10e, f). 
The Hb heterotetramer therefore evolved from the Anca/B homodi- 
mer via two sets of substitutions. Changes at IF2 created a strong new 
interface that conferred tetramerization; changes at IF1 yielded het- 
erospecificity. In both cases, only a few substitutions were required. 


Structural mechanisms for the new interface 


We next investigated how so few substitutions could have generated a 
newand specific multimeric interaction. Using a homology model of the 
heterotetramer, we identified all favourable contacts that mediate associa- 
tionacross the ancestral interfaces and used the phylogeny to determine 
when these amino acids evolved (Fig. 4a—c, Extended Data Fig. 10h, i). 
The substitutions that conferred tetramerization recruited residues 
that already existed onthe opposing surface into newly favourable inter- 
actions. All13 residues that Anca contributes to IF2 are unchanged from 
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Fig. 4| Structural mechanisms of evolution of Hb interfaces. a, Phylogenetic 
classification of ancestral states and substitutions. Black, state in ANCMH; 
purple, substituted from AncMH to Anca/f; blue or red, substituted from 
Anca/B to Ancf or Anca. b,c, Contact maps for residues buried at IF1(b) and 
IF2 (c) of Anca + AncB. Residues coloured by scheme ina. Letters show state in 
AncMH (outside, lower case), Anca/B (middle, lower case) and AncB or Anca 
(inside, upper case). Solid lines, predicted hydrogen bonds; dashed lines, 

van der Waals interactions. Underlined, substitutions in Anca/B4; 
*substitutions in Anca/B2. Cylinders, helices (Extended Data Fig. 2a). Oval, 
deletion of helix Din Anca. d,IF2 contacts inAnca+ AncB. Grey surface, Anca, 
with yellow IF2; hydrogen-bonding atoms are red (oxygen) or blue (nitrogen), 
with bonds as green lines. Cartoon, Ancf backbone, with IF2-interacting 
sidechains (sticks, coloured as ina). e, Close-up of IFlin Anca + AncB model. 
Sticks, hydrogen-bonding residues; spheres, Ca atoms; coloured as ina. 


their ancestral state in Anca/B, and many were acquired earlier (Fig. 4c). 
The IF2 substitutions on the Ancf branch yielded new van der Waals 
contacts and hydrogen bonds with these ancient residues (Fig. 4c, d). 
For example, the ring of Trp40 (substituted in Anc from the ancestral 
glutamine) nestles tightly in an ancient hydrophobic indentation on 
Anca. Similarly, the IF1 substitutions that increase occupancy of the 
a,f, heterotetramer all modify interactions with ancient residues that 
were conserved on Anca (Fig. 4b, e). 

Both interfaces also involve favourable contacts between residues 
that were unchanged from their deep ancestral states in both subu- 
nits. In IF1, for example, R33 on each subunit donates two hydrogen 
bonds to F125 on the facing surface, and both residues evolved before 
AncMH. Each subunit contains both residues, and IF1 occurs twice in 
the tetramer, so these two sites form a total of eight hydrogen bonds 
in the complex (Fig. 4b, e). Similarly, IF2 contains several hydrogen 
bonds and Van der Waals interactions between pairs of residues that 
originated before Anca/B. 

Because of the exponential relationship between binding energy 
and affinity, one substitution can markedly increase the occupancy of 
the multimer, if it builds on the foundation of even very weak interac- 
tions between older residues. Satisfying an unpaired hydrogen-bond 
donor or acceptor or burying a hydrophobic ring can contribute up 
to 16 kJ mol to an association”*”’. Each interface occurs twice in Hb 
(Fig. 2a), so a substitution that confers a favourable interaction does 
so twice in the tetramer, doubling its effect on binding free energy and 
reducing K, by up to six orders of magnitude. A single mutation can 
therefore shift occupancy of the tetramer from virtually nonexistent 
to the predominant species. 


Mechanisms of cooperativity 


Finally, we sought insight into the evolution of the cooperativity and 
reduced affinity of Anca + AncfB. Cooperativity in extant Hb involves 


two conformational states that all subunits can adopt: one has higher 
affinity for oxygen but weaker IF2 contacts between subunits than 
the other”?”°. Cooperativity is classically thought to be mediated by 
an ‘allosteric core’—the set of residues on the helix that connect the 
haem to IF2, whichis positioned differently in the two conformations”. 

To understand the mechanisms that triggered the evolution of 
cooperativity and reduced oxygen affinity, we first examined the 
phylogenetic history of residues in the haem pocket and allosteric 
core. At sites within 4 A of the haem, no substitutions occurred dur- 
ing the interval when cooperativity was acquired. The vast majority 
were acquired before AncMH (Fig. 5a, Extended Data Fig. 1c), includ- 
ing the proximal histidine, which covalently binds the haem iron and 
transduces the movement of the haem upon oxygen binding to the 
allosteric core and IF2, thereby causing the other subunits to shift 
between low- and high-affinity conformations. Two substitutions 
occurred in Ancf on the helix that connects IF2 to the histidine, but 
there were none in Anca (Fig. 5a), and both subunits make the confor- 
mational transition in extant Hb. These observations suggest that the 
structural properties that mediate the allosteric linkage between the 
haem-oxygen-binding site and IF2 already existed in Anca/B, before 
cooperativity and tetramerization evolved. Consistent with this idea, 
many of the conformational changes that mediate Hb cooperativity, 
suchas distortion of the haem’s geometry and movement of the histi- 
dine and helix upon oxygen binding, also occur in myoglobin, which 
is monomeric and noncooperative**”’. 

We hypothesized that, because of this ancient structural connection 
between the IF2 surface and the active site, evolution of the intersubunit 
interaction across IF2 was sufficient to confer cooperativity and reduce 
affinity. We characterized oxygen binding by Anca/B2, which contains 
only two historical substitutions at IF2. As predicted, we found that 
these mutations reduce oxygen affinity by two- to threefold compared 
to Anca/B (Fig. 5b); they also confer weak but statistically significant 
cooperativity (Extended Data Fig. 5b). Acquisition of the tetrameric 
association alone therefore changes the oxygen-binding function of 
the protein and confers cooperative oxygen binding. 

The tetramer's ability to transition between high- and low-affinity 
states, however, is sensitive to mutation. Anca/B4 and the Anca/ 
614 + Anca heterotetramer also have reduced oxygen affinity relative 
to Anca/B, but they lose the cooperativity found in Anca/B2 (Fig. Sb). A 
likely explanation is that the additional mutations in these constructs 
overstabilize the low-affinity conformation relative to the high-affinity 
state. Ifso, then some of the other substitutions that occurred between 
Anca/B and the cooperative complex Anca + AncB must have tuned 
this equilibrium so that both conformations can be occupied, depend- 
ing on the oxygen partial pressure (Fig. 5c). The order in which these 
changes occurred cannot be resolved: the IF2 substitutions may have 
immediately generated a cooperative Hb-like complex, similar to Anca/ 
£2; alternatively, cooperativity may have evolved via a low-affinity 
tetrameric intermediate, like Anca/{4 (Fig. 5c). 


Evolution of molecular complexity 
Our findings establish that a few genetic changes drove the evolution 
of Hb’s complex structure and functions from its dimeric precursor. 
Other molecular complexes may also have evolved by short mutational 
paths. Interactions between proteins and other kinds of substrates, 
such as DNA or small molecules, have historically evolved via one or 
a few historical substitutions”, and we see no reason why multimeric 
interactions in general should be more difficult to evolve. Multimers 
can be engineered from non-assembling precursors by one or a few 
mutations”, and naturally occurring point mutations are known to 
cause disease by inducing higher-order complexes”. 

The simple mechanism by which Hb appears to have evolved its 
cooperativity—acquisition of binding to a molecular partner at anew 
interface—could explain the origin of cooperativity and allostery in 
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Fig. 5| Evolution of cooperativity by interface acquisition. a, Haem pocket 
and IF2in Anca +Ancf. Pink surface, one Anca. Tansticks, haem (with green 
ironand red oxygen). Spheres, Ancf residues within 4 A of haem, coloured by 
temporal category: grey, conserved since AncMH (dark grey, iron-coordinating 
histidine); purple, conserved since Anca/B; blue, substituted between Anca/B 
and Ancf. Sticks, other residues on helix connecting histidine to IF2, coloured 
temporally. Yellow, AncB residues at IF2. No changes near haem or IF2 occurred 
in Anca. b, Oxygen binding by Anca/f mutants with historical substitutions. 
PSO + SE, with Hill coefficient n above, estimated by nonlinear regression under 
effector-stripped conditions (raw data in Extended Data Fig. 10j). *Significant 


other systems***. If two plausible conditions are met—the new interface 


is near or structurally connected to the functionally active site, and 
the optimal conformation for binding is different from the optimal 
conformation for activity—then binding will impair activity, and vice 
versa. Given this tradeoff, the evolution of binding alone will confer 
cooperativity or negative allostery. 

The history of Hb shows that complex molecular structures and 
functions can arise by means other than the long, gradual trajectories 
of functional optimization by which biological complexity has long 
been thought to evolve®**. In principle, molecular assemblies could 
arise and become more complex via neutral processes” *, but this 
scenario is unlikely if many mutations are required. Our work shows 
that the higher-level multimeric state and functional properties of Hb 
evolved through just a few mutations, which fortuitously built upon and 
interacted with ancient structural features. These older features could 
not have been initially acquired because of selection for the functions 
of the final complex, because they existed before those functions first 
appeared. Some are likely to have originated and been preserved by 
selection for more ancient functions, while others may have appeared 
transiently by chance. Although evolution of any particular molecular 
sequence or architecture without consistent selection for those proper- 
ties is vanishingly improbable, our findings suggest that proteins evolve 
constantly througha dense space of possibilities in which complex new 
interactions and functional states are easily accessible. 
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cooperativity (n #1, P< 0.05, F-test, Extended Data Fig. If). Dotted lines, 
affinities of Anca + AncB and Anca/B, which is unaffected by IHP.c, Top, 
evolution of the cooperative Hb heterotetramer. Circles and squares, 
conformations with high and low oxygen affinity, respectively. Two IF2 
substitutions cause homotetramerization, cooperativity, and reduced affinity 
(b). Other substitutions that confer heterotetramerization change the relative 
stabilities of high and low-affinity conformations, abolishing or restoring 
cooperativity. White box, interval in which order of substitutions is unknown. 
Bottom, acquisition of residues in structurally defined categories in Anca and 
AncB, coloured by temporal category. No changed occurred in Anca. 
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Methods 


Sequence data and alignment 
We collected 177 annotated amino acid sequences of haemoglobin 
and related paralogues in 72 species from UniPROT, Ensembl and NCBI 
RefSeq. Sequences were aligned using MAFFT v7". The maximum likeli- 
hood (ML) phylogeny and branch lengths were inferred from the align- 
ment using PHYML v3.1“ and the LG model” with gamma-distributed 
among:;site rate variation and empirical state frequencies. This best-fit 
evolutionary model was selected using the Akaike Information Crite- 
rionin PROTTEST. Node support was evaluated using the approximate 
likelihood ratio test statistic (aLRS), which expresses the difference in 
likelihood between the most likely topology and the most likely topol- 
ogy that does not include the split of interest; aLRS has been shown to 
be reasonably accurate, robust, and efficient compared to other means 
of characterizing support****. The tree was rooted on neuroglobin 
and globin X, paralogues that are found in both deuterostomes and 
protostomes*. Tetrapods possess three Hba paralogues, Hba“, Hba?, 
and Hbo? (also knownas Hbof or Hb-zeta)**. The ML phylogeny inferred 
from this alignment contained a weakly supported sister relationship 
between all Actinopterygian Hba genes and the tetrapod Hba7, to the 
exclusion of tetrapod Hba’ and Hb”. This is anonparsimonious sce- 
nario, because it requires an early gene duplication and subsequent 
loss of the Hba*/Hba? lineage in Actinopterygii. We therefore con- 
strained the topology to unite tetrapod HbaA‘, Hba®, and Hba’ ina 
clade (Extended Data Fig. 1a). PhyML v3.1 was then used to re-infer 
the best-fit branch topology and branch lengths given this constraint. 
Ancestral sequences were reconstructed and the posterior probabil- 
ity distributions of ancestral states were inferred using the ML method 
using the codeml package in PAML 4.9%, given the ML-constrained 
phylogeny and branch lengths. Historical substitutions were assigned 
to phylogenetic branches as differences between the maximum a pos- 
teriori amino acid states between parent and daughter nodes. The 
asymmetry between the branch lengths leading from Anca/B to Anca 
and to Ancf has been observed previously** and presumably reflects 
there being more shared amino acid states between Hba and the out- 
groups (myoglobin, globins E and Y, and so on) than between HbB 
and the outgroups. The sequences for reconstructed ancestors have 
been deposited in GenBank (IDs MT079112, MT079113, MT079114, 
MTO079115). 


Recombinant protein expression 

Ancestral genes were codon-optimized for expression in Escherichi 
coli using CodonOpt and generated by de novo DNA synthesis (IDT 
gBlocks). For globin expression, coding sequences were cloned into 
a pLIC expression vector without affinity tags and expressed under a 
T7 polymerase promoter. For oxygen-affinity measurements, plas- 
mid pCOMAP”, which expresses F. colimethionine aminopeptidase 1 
(MAP1), was cotransformed to ensure efficient N-terminal methionine 
excision. For co-expression of two globins, sequences were expressed 
froma polycistronic operon in plasmid pGM, without tags and under 
a T7 promoter, separated by a spacer containing a stop codon and 
ribosome binding site. F. coli methionine aminopeptidase 1 (MAP1) 
was coexpressed from the same plasmid. 

JM109 DE3 F. coli cells (NEB) were transformed and plated into solid 
Luria broth (LB) medium containing 50 pg/ml carbenicillin (and 50 pg/ml 
kanamycin, if pDCOMAP was being cotransformed). A single colony 
was inoculated into 50 mI LB with appropriate antibiotics and grown 
overnight. Five millilitres of this culture was inoculated into a larger 
500-mlI LB culture. Cells were grown at 37 °C and shaken at 225 rpm 
in an incubator (New Brunswick 126) until they reached an optical 
density at 600 nm (OD600) of 0.4-0.6. The culture was then supple- 
mented with 0.5 mM isopropyl-B-D-1-thiogalactopyranoside (IPTG) 
and 50 mg/I hemin (Sigma). After 4 h of expression at 37 °C, CO was 
bubbled through the solution for 10 min and cells were collected by 


centrifugation at 5,000g. Protein purification was carried out imme- 
diately after expression. 


Protein purification by ion exchange 

Anca/B, P127R, V119A, Anca/B4 + Anca and the alternative ancestral 
reconstructions were purified using ion exchange chromatography”. 
All buffers were saturated with CO before purification and vacuum 
filtered through a 0.2 uM PFTE membrane (Omnipore) to remove par- 
ticulates. After expression, cells were resuspended in 200 ml of50mM 
Tris (pH 6.8) with 2 COMPLETE protease inhibitor tablets (Roche) and 
0.5 mM DTT. The cell suspension was lysed in 50-ml batches in a glass 
beaker using an FB505 sonicator with a power setting of 90%, 1s on/ 
off for 2 min. The lysate was then centrifuged at 30,000g to eliminate 
cell debris, inclusion bodies and aggregates. The supernatant was fur- 
ther syringe-filtered used HPX Millex Durapore filters (Millipore). A 
HiTrap SP cation exchange (GE) column was attached to an FPLC system 
(AKTAprime plus) and equilibrated in SO mM Tris (pH 6.8). Lysate was 
passed over the column. The SP column was washed with 200 ml of 50 
mM Tris to eliminate weakly bound contaminants. Bound Hbs eluted 
with a100-ml gradient of 50 mM Tris (pH 6.9) 1M NaCl, from 0 mM to 
1M. Fractions (0.5 ml) were collected along the length of the gradient. 
The four reddest fractions were collected and then concentrated in an 
Amicon uLtra-15 tube by centrifugation at 4,000gto a final volume of 
500 pl. The sample was injected into a Sephacryl Hiprep 16/60 S-100 
HR size-exclusion column (SEC) for additional purification. The column 
was equilibrated in phosphate buffered saline (PBS) at pH 7.4. Depend- 
ing on molecular weight, purified globins elute at 48-52 ml (tetramer), 
56-60 ml (dimer) or 64-67 ml (monomer). The purity and identity of 
isolated proteins was assessed using 20% SDS-PAGE and denaturing 
HRA-MS. The purified proteins were concentrated and then flash frozen 
with liquid nitrogen until use. 


Protein purification by zinc affinity chromatography 

Anca/B5 + Anca, Anca/B9 + Anca, Anca/B14 + Anca, and Anca + AncB 
were purified using zinc-affinity chromatography, adapted froma 
published method”. Buffers were loaded onto the metal affinity col- 
umn using an AKTAprime FPLC. To prepare the zinc affinity column, 
nickel was removed froma HisTrap column (GE) using stripping buffer 
(100 mM EDTA, 100 mM NaCl, 20 mM TRIS, pH 8.0). The column was 
then washed with diH,O for five column volumes. Then 0.1 M ZnSo, 
was passed over the column until conductance reached a stable value. 
The column was then washed with five column volumes of water. After 
expression, cells were resuspended in 50 ml lysis buffer containing 20 
mM Tris and150 mM Nacl (pH 7.4). The cells were sonicated as described 
above. The lysate was passed over a zinc-affinity HisTrap column. The 
column was washed with 200 ml wash buffer (20 mM Tris and 150 mM 
Nacl, pH 7.4). The bound Hbs were eluted with a 50-ml gradient of imi- 
dazole, up to 500 mM, and 0.5-ml fractions were collected during the 
run. The four reddest fractions were collected. The Hb-containing 
fractions were concentrated and injected into a Sephacryl S-100 HR 
column for additional purification, as described above. 


Purification of globin Y 

The globin Y sequences of Callorhincus milli (NCBI reference sequence 
NP_001279719.1) and Xenopus laevis (NCBI reference sequence 
NP_001089155.1) were synthesized (IDT, Coralville, IA, USA) and cloned 
into a pLIC vector with an N-terminal hexahistidine tag (MHHHHHH). 
Expression and lysis were carried out under the same conditions as 
described above. The bacterial lysate was passed over a 5-ml HisTrap 
nickel-affinity column (GE). The column was washed with five column 
volumes of wash buffer (20 mM Tris and 150 mM Nacl, pH 7.4). The 
bound globins were eluted witha 15-ml gradient of imidazole from 0 to 
500 mM; five fractions of equal volume were collected. The three red- 
dest fractions were combined. The eluted protein was concentrated to 
2ml, passed througha0.45-ym filter, and subjected toa final purification 


by SEC using a Sephacryl S-100 HR column and an AKTA Prime FPLC 
system. Globin Y eluted in fractions collected between 61 and 64 ml. 


Purification of his-tagged AncMH 

The sequence of AncMH was codon-optimized for expression in 
E. coli, synthesized, and cloned into a pLIC vector with an N-terminal 
hexahistidine tag, because untagged AncMH was not readily purifiable. 
Recombinant expression, cell lysis, and purification were carried out 
under the conditions described for globin Y. 


Characterization of protein stability 

Protein stability was measured by circular dichroism (CD) using aJASCO 
1500 CD spectrophotometer. Experiments were conducted at protein 
concentration of 10 uM (50 mM sodium fluoride, 20 mM sodium phos- 
phate buffer) ina 0.2-mm path length quartz cell. CD spectra were col- 
lected at 2 °C intervals (10 min each) as the temperature was increased 
from 25 °C to 95 °C. Molar ellipticity at 222 nm was measured four times 
at each temperature; the mean was then divided by the value of molar 
ellipticity at 222 nm at room temperature (25 °C) to estimate the frac- 
tion of unfolded protein. To estimate the melting point (7,,) of each 
protein, a custom script was written to find the best fit parameters 
(7,,and slope) for the Boltzmann sigmoid function: fraction unfolded = 
1/1. +e ™*P)) All three ancestral proteins were stable, with 7, > 60 °C 
(Extended Data Fig. Ic). 


High-resolution denaturing mass spectrometry 

Two hundred microlitres of purified protein was placedinaSlide-A-Lyzer 
MINI dialysis unit that was suspended in 500 ml of 50 mM ammonium 
acetate. The solution was stirred overnight at 4 °C. After dialysis, the 
proteins were transferred to a microfuge tube and centrifuged at 
30,000g to eliminate aggregates. The concentration was adjusted to 
20 uM. Half a microlitre of sample was sprayed using an Agilent 6224 
Tof Mass Spectrometer at fragment voltage 200 V. Protein masses were 
estimated by maximum entropy mass deconvolution implemented in 
MassHunter (Agilent). 


Size-exclusion chromatography and multi-angle light scattering 
All proteins were converted to the CO-bound form by adding sodium 
dithionite to 5 mg/ml, desalting on a Sephadex G-25 desalting column 
equilibrated with CO-saturated PBS (150 mM NaCl, pH 7.4), and then 
passing CO through the eluent. Protein concentration was measured 
by UV absorbance at 280 nm (Tryptophan) and 419 nm (HbCO-specific) 
using a Nanodrop 2000c (Thermo-scientific). For analytic SEC, a Super- 
dex 75 10/300 GL column (GE) was equilibrated in CO-saturated PBS, 
and then injected with 500 pl sample, using a500-plinjectionloop onan 
AKTAprime and monitored by absorbance at 280 nm. For SEC coupled 
with MALS, a Superdex 200 10/300 GL column was injected with 150 
pl sample on the AKTAprime; light scattering and refractive index of 
eluent were measured using a Dawn Helios-II (Wyatt) light scattering 
detector and Optilab T-rEX refractometer, respectively. Molar mass 
fitting was carried out using Astra software. 


Globin concentration assay 

After protein expression, cells harvested by centrifugation from 
one 500-ml culture were resuspended in 15 ml PBS and sonicated as 
described above. Cell debris and aggregate were removed by cen- 
trifugation at 20,000g. Remaining lysate was concentrated to 5 mlin 
Amicon pLtra-15 centrifuge concentrators (3,000 NMWL). Five hun- 
dred microlitres of this sample was injected into a superdex-75 10/300 
GL column. Fractions of eluent (0.2 ml) were collected. We took 50 ul 
from each fraction and added it to 150 pI Hemoglobin Assay kit reagent 
(Sigma) in one well of a 96-well plate. In each plate, 50 pl of al100 mg/dl 
calibrator (Sigma) was also added to 150 pl of Hemoglobin Assay kit 
reagent (Sigma) in one well. We used 50 ul PBS added to the 150 pL 
reagent as a blank. Absorbance was measured at 400 nm using a Victor 


x5 plate reader (PerkinElmer). Haem concentration in each fraction 
was measured using the following equation: concentration = 62.5 x 
(OD,ampte ~ OD ptank)/(OD catibrator a, ODptank) uM. 


Oxygen affinity and cooperativity 

Purified proteins were deoxygenated using sodium dithionite at 10 
mg/ml and immediately passed through a PD-10 desalting column 
(GE Healthcare) equilibrated with 25 ml of 0.01 M HEPES/0.5 mM EDTA 
(pH 7.4). Eluted proteins were concentrated using Amicon pLtra-4 
Centrifugal Filter Units (Millipore). Equilibrium oxygen-binding assays 
were performed at 25 °C using a Blood Oxygen Binding System (Loligo 
Systems), using 0.1mM protein (haem concentration) dialysed in 0.1M 
HEPES/O.5 mM EDTA buffer. Protein solution was sequentially equili- 
brated at 3-5 different oxygen tensions (PO,) yielding 30-70% satura- 
tion while continually monitoring absorbance at 430 nm (deoxy peak) 
and 421nm (oxy/deoxy isosbestic point). Plots of fractional saturation 
against PO, were constructed from these measurements, and the Hill 
equation was fit to each plot using OriginPro 2016, yielding estimates 
of P50 (PO, at half-saturation) and the cooperativity coefficient (n, the 
slope at half saturation in the Hill plot, n50). We collected 95% Cis on 
parameter estimates by multiplying the s.e.m. over replicate experi- 
ments by 1.96 (Fig. 1d, e). The statistical significance of cooperativity 
was assessed by using an F-test to compare the fit of the datato a model 
in which nis a free parameter to a null model in whichn=1. 

Toassess the potential for ancestral proteins to have been regulated 
by allosteric effectors, assays were performed in stripped medium or 
with IHP added at 0.5 mM. Although IHP may not have been the physi- 
ological effector in ancestral organisms, it has been shown toallosteri- 
cally regulate Hbs from representatives of all major vertebrate lineages, 
whereas other organic phosphates such as 2,3-biphosphoglycerate 
(BPG), ATP, and GTP have more lineage-specific effects” *. IHP there- 
fore serves as a useful ‘all-purpose’ polyanion to test the allosteric regu- 
latory capacity of the ancestral Hb. There is ample precedent for using 
IHP to study Hb allostery irrespective of whether it is the authentic 
physiological effector°°****. This is because IHP modulates Hb-O, affin- 
ity ina manner that is qualitatively similar to those of other effectors, 
including BPG, ATP, GTP, and IPP®°°. These molecules all share the same 
mechanism of action, reversibly binding a set of cationic residues inthe 
cleft between B, and B, subunits, and thereby stabilizing the low-affinity 
T conformation via electrostatic interactions” ». 


Native mass spectrometry 

Proteins were buffer exchanged into 200 mM ammonium acetate 
with a centrifugal desalting column (Micro Bio-Spin P-6, BioRad) and 
loaded into a gold-coated glass capillary. Samples were ionized for MS 
measurement by electrospray ionization. MS and MS/MS ion isolation 
were performed on a Synapt G1 HDMS instrument (Waters Corpora- 
tion) equipped with a radio frequency generator to isolate higher m/z 
species (up to 32k) in the quadrupole, and a temperature-controlled 
source chamber as previously described®°. Instrument parameters were 
tuned to maximize signal intensity for MS and MS/MS while preserving 
the solution state of the protein complexes. All samples were sprayed 
at room temperature. Instrument settings were: source temperature 
of 50 °C, capillary voltage of 1.7 kV, sampling cone voltage of 100 V, 
extractor cone voltage of 5 V, trap collision energy of 25 V, argon flow 
rate in the trap set to 7 ml/min (5.6 x 10 mbar), and transfer collision 
energy set to 15 V. The T-wave settings were for trap (300 ms7/1.0 V), 
IMS (300 ms1/20 V) and transfer (100 ms7‘/10 V), and trap DC bias 
(30 V). For MS/MS, ion isolation was achieved using the same settings 
as described above, with the quadrupole LM resolution set to 6. Activa- 
tion of protein complexes for individual monomer identification was 
achieved by increasing the trap collision voltage to 120 V in MS/MS 
mode, with all other settings unchanged. Analysis of the MS and MS/ 
MS data to estimate masses and relative abundances was performed 
with the software program Unidec“. 
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The occupancy of each stoichiometric state was calculated as the 
proportion of globin subunits in that state, based onthe summed areas 
under the corresponding peaks in the spectrum. To estimate K, of the 
monomer-to-dimer transition Anca/B, we performed nMSat variable 
protein concentrations. At each concentration, the observed frac- 
tion of subunits incorporated into dimers (F,) was estimated as F, = 
2X4/(X%m+ 2X4), wherex,, and x, are the sums of the signal intensities ofall 
peaks corresponding to the monomeric and dimeric stoichiometries, 
respectively. This procedure was repeated at a range of protein con- 
centrations. Nonlinear regression was then used to find the best-fit 
value of K, using the equation 
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where P,,, is the total protein concentration (expressed in terms of 
monomer) estimated by UV absorbance at 280 nm. The resulting K, 
is expressed in terms of the concentration of globin subunits. We 
observed no higher stoichiometries. 

To estimate K, of the heterodimer—heterotetramer transition in 
Anca + AncB (or mutant ancestral globins) we performed nMS at 
variable protein concentrations. Because nMS directly quantifies 
the abundance of all species in solution, we were able to extract 
molarities for the a,/B, heterodimer and a,/B, heterotetramers and 
directly calculate the K, of their association/dissociation equilibrium, 
without having to fit a large number of K, values as part of a coupled 
set of many equilibria across many homomeric and heteromeric 
forms. At each concentration, we first calculated the total fraction 
of subunits that were incorporated into haem-bound heterodimers, 
including both free heterodimers and heterodimers assembled into 
heterotetramers, as 
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wherex is the sum of the signal intensities of all peaks corresponding 
to the stoichiometry indicated by the subscript. y,,,-.1: is the signal 
intensity of the peaks corresponding to heterodimers that are only 
partially haem-bound and cannot associate into tetramers. The con- 
centration of all haem-bound subunits incorporated into heterodimers 
(free heterodimers or assembled into heterotetramers) was calculated 
aS Cog = Fag X Pro The fraction of all heterodimers incorporated into 
heterotetramers was calculated as Fyog9 = 4Xq2p2/ (2X capi + 4Xa2p2)- 

Assembly of heterodimers into heterotetramers as concentration 
increases was then analysed to find the best-fit value of Ky using non- 
linear regression and the following equation: 
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The resulting K,is expressed in terms of the concentration of globin 
subunits contained in heterodimers and heterotetramers. 

For homotetramerization of globins expressed in isolation, the 
K, of the dimer-tetramer transition was calculated using a similar 
approach. The fraction of all subunits incorporated into homodimers 
(including both free homodimers and those associated into homo- 
tetramers) was calculated as Fy = (2Xq + 4.X,)/(Xm + 2Xq + 4x,), and the 
concentration of all dimers was calculated as Cy =F, x P,.,. The fraction 
of all dimers that were incorporated into tetramers was calculated as 
F,=4x,/(2x4+ 4x,). Nonlinear regression was then used to fit K, to the 
data using the equation 
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The resulting K,is expressed in terms of the concentration of globin 
subunits contained in homodimers and homotetramers. For Fig. 3c, 
Anca/B4 was coexpressed with Anca and fractionated by SEC, and the 
tetrameric fraction was analysed by nMS. 

Native MS spectra for human Hb and Anca/B14 + Anca at high con- 
centrations contained peaks corresponding to dimers that had lost one 
or both haems. In these cases, we calculated K, values by both including 
and excluding these species. For the fits shown in Figs. 1d and 3d, these 
peaks were excluded from the analysis; for the fits shown in Extended 
Data Fig. 2k, they were included. Both approaches yielded K, estimates 
of the same order, although the fit to the data was much better in the 
former case. Spectra for Anca + Ancf included twinned peaks, which 
represent caesium iodide adducts on tetramers. For the fits shown in 
Figs. 1c and 3d, these peaks were excluded; for the fits in Extended Data 
Fig. 2i, they were included. Both approaches gave almost identical K, 
estimates, although the fit to the data was better in the former case. 


Hydrogen-deuterium exchange mass spectrometry 

All chemicals and reagents were purchased from Sigma Aldrich (Gil- 
ligham, UK). Native equilibration buffer contained 100 mM PBS (H,0), 
pH 7.4. Labelling buffer contained 100 mM PBS (D,0), pD 7.4. Quench 
buffer contained 100 mM potassium phosphate (H,O), pH1.9, with1M 
guanadinium chloride. Five microlitres of protein sample was diluted 
into 55 pl of a deuterated buffer of the same composition and corre- 
sponding pD. This resulted in a labelling solution ~92% D,O. Samples 
were incubated for between 15s and1hat 20 °C before being quenched 
with an ice-cold H,O buffer (pH 1.9) of equal volume. The quenched 
solution pH was ~2.5 at 0 °C. This was quickly injected into an on-line 
HDX manager (Waters, Milford, MA, USA). The sample was injected 
onto a50-pl sample loop at O °C before passing over an immobilized 
pepsin column (Enzymate Pepsin 5 pm, 2.1 mm x 30 mm, Waters) at 
20 °C using anisocratic H,O (0.1% v/v) formic acid solution (200 pl/min). 
Peptide products were collected on a trapping column (BEH C18, 1.7 
pum, 2.1mm x 5mm, Waters) held at 0 °C. After 2 min of collection, and 
de-salting, peptides were eluted from the trap column onto an analyti- 
cal column (BEH C18, 1.7 um, 1mm x 100 mm, Waters) for separation 
using a reverse-phase gradient with a flow rate of 40 pl/min. The elu- 
tion profile using a H,O/MeCN (+0.1% formic acid v/v) gradient was 
as follows: 1-7 min 97% water to 65% water, 7-8 min 65% water to 5% 
water, 8-10 min hold at 5% water. The analytical flow rate was 40 pl/min 
and the eluate was electrosprayed directly into a Synapt G2Si (Waters, 
Wilmslow, UK) Q-ToF instrument for mass analysis. 

Sample handling was semi-automated using a robotic liquid han- 
dling HDX system (LEAP Technologies, Ringwood, Australia) to ensure 
reproducibility in timings. A blank and cleaning injection cycle was 
performed between each labelling experiment. Mass spectrometry 
conditions were as follows: capillary 2.8 kV, sample cone 30 V, source 
offset 30 V, trap activation 4 V, transfer activation 2 V. The source 
temperature was set to 80 °C and cone gas flow 80 I/h, the desolva- 
tion temperature was 150 °C and the desolvation gas flow was 250 I/h. 
LeuEnk was used as an internal calibrant and acquired every 30 s. For 
reference, back-exchange was estimated separately using lyophi- 
lized samples of angiotensin II. Angiotensin II was dissolved into D,O 
(pH 4.0) and left for 48 h. After that, the sample was loaded onto the 
same robotic and UPLC system and analysed after 2 min of trapping to 
give a back-exchange of 31.8 + 0.2%. 

Peptides were identified, in the absence of labelling, by 
data-independent MS/MS analysis (MS*) of the eluted peptides and 
subsequent database searching in the Protein Lynx Global server 3.0 
software (Waters). Peptide fragments were generated inthe trap region 
through collisions with Ar gas (0.4 ml/min). Peptide identifications were 
filtered according to fragmentation quality (minimum fragmentation 
products per amino acid: 0.2), mass accuracy (maximum [MH]+ error: 5 
ppm), and reproducibility (peptides identified in all MS‘ repeats) before 
their integration into HDX analysis. HDX-MS data were processed in 


Dynamx 3.0 software (Waters), and all automated peptide assignments 
were manually verified, with noisy and overlapping spectra discarded. 
External python scripts were written to generate and analyse the Woods 
plots from data outputs of Dynamx. 

Sample concentration was varied to control the relative populations 
of monomeric and dimeric species of Anca/B. After dilution into the 
labelling buffer, Anca/B concentrations were 0.67, 2, 15, and 75 uM; to 
avoid significant sample overloading of the column when using high 
concentrations of Anca/B, samples were diluted during quenching 
to give an injection quantity of ~15 pmol. To ensure back-exchange 
occurred equally across all diluted samples, the final ratio of H.O:D,0 
after quenching was kept constant at 54:46 and the pH of the quench 
buffer adjusted to pH 2.5. This allowed all concentrations to be com- 
pared without correcting for back-exchange. All automated peptide 
assignments were manually verified, with noisy and overlapping spectra 
discarded. After processing, a sequence coverage of 91% was achieved 
with a redundancy of 5.3. 


Statistical comparison of peptides 

For each peptide in the dilution experiment, the difference in deuterium 
uptake between different conditions was normalized by dividing the 
difference by the absolute uptake in the dimeric condition (75 1M). In 
Fig. 2c, peptides that incorporated deuterons in the monomeric con- 
dition at quantities statistically indistinguishable from zero (P< 0.01) 
were excluded. For peptide locations and alternative normalization 
methods, see Extended Data Figs. 6, 7. A permutation test was used 
to determine whether relative deuterium uptake by residues at IF1 (or 
IF2) was significantly different from that of other residues. To eliminate 
statistical non-independence arising from the fact that many peptides 
overlap, we constructed a non-overlapping peptide set by subsam- 
pling without replacement from the total set of peptides, requiring 
that selected peptides do not share any residues. One thousand such 
non-overlapping peptide sets were constructed, and a P value was 
estimated for each set using the following permutation test. Peptides 
inthe nonoverlapping set were partitioned into those containing resi- 
dues mapping to IF1 and those containing no IF1 residues; a similar 
approach was used to test for a difference between peptides containing 
IF2 residues and those containing none; peptides containing residues 
contributing to both interfaces were excluded. The mean of the meas- 
ured relative uptake difference over peptides in each partition was 
calculated, and the difference between the means of the two partitions 
was determined. A null distribution was then estimated by randomly 
partitioning peptides in the nonoverlapping set into two categories 
(without changing the size of the categories) and calculating the differ- 
ence in means between the two randomly permuted peptide partitions. 
The P value was calculated as the proportion of random partitions in 
which the difference between peptide category means was greater 
than or equal to that of the difference for the empirical categories. 
Extended Data Fig. 5 displays the distribution of Pvalues calculated in 
this way for 1,000 non-overlapping peptide sets. Aninterface category 
was identified as having significantly increased uptake if the mean P 
value from this analysis was <0.05. 


Homology models for Anca/B IF1 and IF2 

Structural modelling of the Anca/B monomer was performed using 
SWISS-MODEL. A deoxy structure of an Hba monomer contained in 
recombinantly expressed human haemoglobin (1A3N) was used as the 
template. Hba was used because its sequence similarity to Anca/f is 
greater than that of any other extant globin. Furthermore, both Hba 
and Anca/® form homodimers in isolation, unlike HbB (whichis a mix- 
ture of dimers and tetramers at similar concentrations) or myoglobin. 
EMBO PISA“ was used to identify sites in 1A3N subunits that buried 
>50% of their surface area at the interfaces or formed intersubunit 
hydrogen-bonding or salt bridge contacts at either IF1 or IF2. The HAD- 
DOCK 2.2 webserver was used to dock two Anca/B monomers along an 


IFl or anIF2 orientation by specifying the corresponding homologous 
residues (1a3n). The best scoring docked complex was used for all sub- 
sequent analyses and visualizations. 


Homology models, interface burial, and contact maps for 

Anca + Ancf and Anca/B14 

Structural modelling was performed using SWISS-MODEL. A deoxy 
structure of recombinantly expressed human haemoglobin (PDB 1A3N) 
was used as the template for Anca + Anc and for Anca/B14 + Anca. 
The extant Hba and Hbf were used as templates because they have 
higher sequence identity to to Anca and Anca/B14, respectively, than 
any other globin paralogues. EMBO PISA was used to estimate residue 
burial at the interfaces and to predict hydrogen bonds across inter- 
faces. Residues were classified as contributing to an interface if their 
solvent-accessible surface area was reduced by >10% in the assembled 
form relative to the nonassembled form. Van der waals contacts were 
identified as pairs of cross-interface atoms with centre-to-centre dis- 
tances <3.5A, using acustom script. PyYMOL v4.19 was used to visualize 
and render protein structures. The similarity between interfaces in 
the homology model and those in X-ray crystal structures of extant 
haemoglobins was assessed by aligning the Anca/B14 + Anca tetramer 
to Hb from human (1A3N) and rainbow trout (Oncorhynchus mykiss 
2R1H) (Extended Data Fig. 10). 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


Reconstructed ancestral sequences have been deposited in Gen- 
Bank (IDs MT079112, MT079113, MT079114, MTO79115). Alignment 
and inferred phylogeny, raw mass spectra, oxygen-binding data, and 
homology model coordinates have been deposited at https://doi. 
org/10.5061/dryad.wOvt4b8mx. HDX-MS data are available at https:// 
doi.org/10.5287/bodleian:5zRrdMB7E. 


Code availability 


Scripts for analysis for the HDX permutation analysis and identification 
of contacts between subunits in modelled structures have been depos- 
ited at https://github.com/JoeThorntonLab/Hemoglobin-evolution. 
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Extended Data Fig. 2 | Stoichiometric characterization of ancestral globin g, Monomer-dimer association by Anca/f. Abundances of monomers and 


complexes. a, Homology model of Anca + Ancf (template 1A3N) showing dimers were characterized using nMS across a range of concentrations. Circles, 
haem (tan spheres). Blue cartoon, Ancf subunits; red, Anca. Helices and fraction of all subunits that were assembled into dimers as a function of the 
interfaces are labelled. Green, proximal histidine. b, SEC and multiangle light concentration of subunits in all states. Nonlinear regression (line) was used to 
scattering of Anca/B (90 pM) and Anca + Ancf (60 pM). Black, relative estimate the dissociation constant (Ky, withs.e.).h, SEC of Anca/B at high 
refractive index; red, estimated molar mass. Dashed lines, Anca/f; solid lines, concentrations (purple and grey lines). Black curves show SEC traces of human 
Anca+Ancf. Dashed horizontal lines, expected mass for dimers and tetramers. Hb and myoglobin for comparison. i, nMS of human Hb at 50 uM.j, SEC of 

c, SEC of human Hb (dashed) and Anca + Ancf (solid) at 100 pM. Top inset, AncMH (cyan) ata high concentration. SEC traces of human Hb and myoglobin 
SDS-PAGE of these complexes, with bands corresponding toa-andB-subunits. (black) are shown for reference. Dashed line, Anca/B dimer elution peak 
Bottom inset, masses estimated by denaturing MS of Anca + AncB, compared volume (see f). k, Alternative estimation of affinity of dimer-tetramer 

to expected masses based on primary sequence. d, SEC of Anca/B across a association by nMS. For human Hb (green) and Anca/B14 + Anca (orange), the 
series of concentrations. Dashed vertical lines, elution peak volumes of human fraction of heterodimers incorporated into heterotetramers includes both 
haemoglobin tetramer and myoglobin monomer. e, Tandem MS of the haem-deficient and holo-heterodimers. For Anca + Ancf (red), caesium iodide 
heterotetrameric peakin the Anca + AncB nMS (indicated in Fig. 1b). Ejected adduct was included. Compare to Figs. 1d and 3d. K, values (withs.e.) were 
monomer and trimer charge series and the subunits they contain are shown. estimated by nonlinear regression (lines). Allconcentrations are expressedin 
Pink, Anca; blue, Ancf. f, nMS of Anca + AncB and Anca/B at 4 4M and 100 uM. terms of monomer. AllnMS and SEC experiments were performed once at each 


Charge series and fitted stoichiometries are indicated. *Unhaemed apo form. concentration. 
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scattering. c, Colorimetric haemoglobin concentration assay. Absorbance and Anca/f (black). f, nMS of His-tagged AncMH at 70 uM, with monomer 
spectra before (black) and after (red) adding 150 pl Triton/NaOH reagent to charge series indicated. *Cleavage product. Green, apo. Fractional occupancy 
50 pl purified Anca/B. In the presence of reagent, globins absorb at 400 nm. ofthe monomeric form is shown. All experiments were performed once. 


d, SEC of crude cell lysate after expression of AncMH (purple) and Anca/B 
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Extended Data Fig. 4| Biochemical inferences about ancestral Hbs are 
robust to uncertainty in sequence reconstructions. a—e, Maximum 
parsimony inferences of ancestral stoichiometry and interface losses or gains 
based onthe distribution of stoichiometries among extant globins. a, Hbs inall 
extant lineages of jawed vertebrates are heterotetramers, supporting the 
inference that AncHb was heterotetrameric. Stoichiometries from 
representative species’ Hbs are shown with PDB IDs. b-e, Each panel shows a 
hypothetical set of ancestral stoichiometries, plotted on the phylogeny of 
extant Hb subunits and closely related globins, with the minimal number of 
changes required by eachscenario. b, The most parsimonious reconstruction 
is that Anca/B was ahomodimer and AncMH was a monomer. c, For Anca/B to 
have been a tetramer, early gain and subsequent loss of IF2 in Hba would be 
required. d, For Anca/ to have been a monomer, IF1 would have to have been 
independently gained in Hba and Hb. e, For AncMH to have been a dimer, IF1 
would have to have been lost in lineages leading to the monomers myoglobin 
(Mb) and globin E (GbE)””. The dimeric globins most closely related to Hb— 
agnathan ‘haemoglobin’ (aHb) and cyotoglobin (Cyg)—use interfaces that are 
structurally distinct from those in Hb”, indicating independent acquisition. 
f-j, Alternative reconstructions of Anca/f are biochemically similar to the ML 
reconstruction. f, Alternative ancestral versions of Anca/B were constructed, 
each containing the the ML state at every unambiguously reconstructed site 
andthe second most likely state at all ambiguously reconstructed sites, using 
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different thresholds of ambiguity. For each alternative reconstruction, the 
table shows the threshold posterior probability (PP) used to define an 
ambiguous site, as well as the fold-difference in total PP of the entire sequence 
and the number of sites that differ from the ML reconstruction. g, SEC at 75 1M 
of ML reconstruction of Anca/B and AItAll reconstructions, which contain all 
plausible alternative states with PP above a threshold. Dashed lines show 
elution peak volumes for the dimeric ML a/B and monomeric human 
myoglobin. Constructs that elute between the expected volumes for dimer and 
monomer indicate dimers that partially dissociate during the run. None 
tetramerize; all form predominantly dimers, except AItAII(PP >0.2), whichis 
~62,000 times less probable than ML, which is mostly monomeric. UV traces 
were collected once for each construct. h, Oxygen binding curves of 
Anca/B-AItAll(0.25), the dimeric AItAll with the lowest PP, with and without 2x 
IHP. Dissociation constant (P50, withs.e.) estimated by nonlinear regressionis 
shown. Lack of cooperativity is indicated by the Hill coefficient (nSO=~1.0). 
Oxygen binding at each concentration was measured once. i, Alternate globin 
phylogeny that is more parsimonious than the ML topology with respect to 
gene duplications and synteny but has alower likelihood given the sequence 
data. A version of Anca/B (Anca/B-AltPhy) was reconstructed on this 
phylogeny.j, SEC of Anca/B-AltPhy. Dashed lines show expected elution 
volumes for various stoichiometric forms. 
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Extended Data Fig. 5| HDX-MS of Anca/B. a-c, Deuterium uptake each incubation time is shown. g, Amino acids 99 to 111 contact IF1 (orange) or 

measurements across time for three peptides. Left vertical axis, rawdeuterium —_ IF2 (yellow). The homology model of one chain of Anca/B (cartoon and sticks) 

incorporation; right vertical axis, deuterium incorporation divided bythetotal was aligned to the a-subunit of human Hb (PDB1A3N); B-subunits are shown as 
number of exchangeable amide hydrogens per peptide. Uptake curves for four surfaces. h, Normalized deuterium uptake difference (mean +s.e. from three 


concentrations of mutants IFlrev and P127R are shown. Each point shows replicates), defined as the uptake difference between monomer and dimer 
meants.e. of three replicate measurements. d-f, Raw MS spectra for the divided by the uptake of the monomer, observed for peptides containing 
peptides shown ina-c, respectively, at 0.67 ,1M (red, at which the protein is amino acids 99-111. Grey N-terminal residues do not contribute to uptake. 
monomeric), and 75 uM (purple, at which it is entirely dimeric: see Extended Amino acid sequences are aligned and labelled (orange dots, IF1; yellow dots, 


Data Fig. 2). The traces are slightly offset to allow visualization.Onereplicateat —_IF2). 
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Extended Data Fig. 6| Statistical analysis of HDX-MS results for peptides 
containing interface residues. a, Residues in human Hb (PDB1A3N) that bury 
at least 50% of their surface area in either IF1 (orange) or IF2 (yellow) are shown 
as spheres. Red and pink, a-subunits; blue, B-subunits. b, Homology models of 
Anca/B dimer across IF1 (left) and IF2 (right). Two subunits of Anca/B were 
computationally docked using HADDOCK using the a1/B1interface (IF1, left) or 
a1/B2 interface (IF2, right) of human Hb (1A3N) as atemplate. c, Coverage of 
peptides produced by trypsinization of Anca/B, assessed by MS. Orange and 
yellow, sites that bury surface area at IF1 and IF2 inthe modelled dimeric 
structures, respectively. d, Classification of trypsin-produced peptides that 
contribute to IF1 or IF2. Each circle represents one peptide, plotted by average 
surface area per residue buried at each interface (total buried area divided by 
total number of residues). Dashed lines, cutoffs to classify peptides as 
contributing to IF1 (orange) or IF2 (yellow). e, f, Correlation between change in 
deuterium uptake and burial of surface area at IF1 or IF2. Each point is one of 47 
peptides, plotted according to the normalized difference in deuterium uptake 
between concentrations at which monomer or dimer predominates (0.67 or 

75 uM, normalized by uptake at 75 1M) and average buried surface area at IFl or 
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IF2.r, Pearson correlation coefficient. g, Permutation test to evaluate the 
difference in deuterium uptake at two time points by peptides containing IF1 
versus all other peptides (orange), or IF2 versus all other peptides (yellow). To 
avoid non-independence, the experimental data were reduced toaset of 
nonoverlapping peptides by sampling without replacement. Peptides were 
categorized by whether they contained residues at IF1, IF2, or neither; peptides 
that contributed to both IFs were excluded. For each interface, the mean 
uptake by peptides contributing to the interface was calculated, as was the 
mean uptake by peptides not in that category, and the difference in means was 
recorded. Peptide assignment to categories was then randomized, and the 
difference in mean uptake recorded; this permutation process was repeated 
until all possible randomized assignment schemes for those peptides had been 
sampled once. P value, fraction of permuted assignment schemes witha 
difference in mean uptake between categories greater than or equal to that 
from the true scheme. This process was repeated for 1,000 nonoverlapping 
peptide sets; the histogram shows the frequency of P values across these sets. 
Dashed line, P=0.05. 
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Extended Data Fig. 7 | Dissection of IF1and IF2 by HDX-MS and difference in deuterium uptake between monomer and dimer conditions. 
mutagenesis. a, b, Peptides with residues contributing to IF1 (a) or IF2 (b) that Homology models of the Anca/B dimer using half-tetramers of human Hb 
have the largest relative uptake difference upon dimerization are shown as (1A3N) are shown. Ina, the dimer is modelled using the a1/B1 subunits; inb, itis 


purple tubes. Sticks, side chains predicted to contact the other subunit (orange modelled onthe al/B2 subunits. c,d,nMS of interface mutants Q40R (at IF2) 
surface, IF1; yellow surface, IF2). Side chains are coloured orange (IF1) or yellow and P127R (at IF1) and for mutants IFlrev and IF2rev, in which interface residues 
(IF2) if they were substituted between AncMH and Anca/f; purple, unchanged in Anca/B were reverted to their states in AncMH. Allassays at 20 pM. 

in that interval; green, site for targeted mutation P127; blue, Q40. Circled Stoichiometries and charge states are labelled. Unhaemed peak series due to 
numbers show the rank of each peptide amongall peptides for the normalized haem ejection during nMS is labelled. Spectra were collected once. 
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Extended Data Fig. 8 | See next page for caption. 


Extended Data Fig. 8 | Alternative methods to normalize deuterium 
uptake. a, Deuterium uptake difference between monomer (0.67 LM) and 
dimer (75 pM) at each time point was normalized by the length of each peptide. 
Peptides were categorized by the interface to which they contribute, as in 

Fig. 2c. *Interface peptide sets that show significantly increased uptake upon 
dilution when compared to peptides outside of that interface, as determined 
by apermutation test (see Extended Data Fig. 6). Each point shows the 

mean +s.e. from three replicates. b, Permutation test to evaluate the difference 
in deuterium uptake at 60 min by peptides at each interface, when uptake 


difference per peptide is normalized by length (as described in Extended Data 
Fig. 6g). Orange, peptides with IF1-containing residues versus those with no IF1 
residues. Yellow, IF2-containing peptides versus those with no IF2 residues. 
Dashed line, P=0.05.c,d, Average deuterium uptake difference per residue (c) 
and uptake difference normalized by dimer uptake (d) for peptides at different 
time points. Orange, IF1 sites; yellow, IF2 sites. Each rectangle shows the 
position of the peptide in the linear sequence and its uptake (mean of three 
replicates). 
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Extended Data Fig. 9 | See next page for caption. 


Extended Data Fig. 9 | Effect ofinterface-disrupting mutations on Anca/B. 
a,b, SEC of mutants at IF2 (Q40R and IF2rev, which reverts all substitutions that 
occurred between AncMH and Anca/f at IF2 sites) and at IF1 (P127R and IFlrev) 
at 100 uM. Dashed line, elution peak volume for Anca/B. c, Circular dichroism 
spectra for P127R and Anca/B, showing comparable helical structure. d, SEC 
from IF1 mutant V119A at 64 uM, compared to Anca/B. e, nMS of Anca/B, P127R 
and IFlrev at 10 1M. Stoichiometries and charges are shown. For a-d, nMS and 
SEC experiments were performed once per concentration. f, Normalized 
deuterium uptake by IF1-containing peptide 106-111 in HDX-MS of Anca/B 


(75 uM) and mutants P127R (2 uM) and IFlrev (2M). Mean +s.e. of three 
replicates. g,h, Difference between deuterium uptake by each peptide in 
Anca/B and uptake by the same peptide in IF1 mutants P127R (g) and IFlrev (h), 
both at 2 uM, normalized by uptake in Anca/f. Peptides are classified by 
interface category. Mean +s.e. of three replicates. *Peptide sets that have 
significantly increased relative uptake (by permutation test, see Extended 
Data Fig. 6) compared to all other peptides (peptides containing bothIFland 
IF2 residues excluded). 
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Extended Data Fig. 10| Genetic mechanisms of tetramer evolution. a,c, SEC 
of Anca/B containing sets of historical substitutions, when coexpressed and 
purified with Anca. Dashed lines, elution volumes of known stoichiometries 
(4-mer, Anca + AncB; 2-mer, Anca/B; monomer, human myoglobin). Pie charts, 
relative proportions of a (pink) and a/B mutant (purple) subunits in fractions 
corresponding to each peak, as determined by high-resolution MS (Extended 
Data Fig. 11).b, nMS of tetrameric fraction ina at 20 1M (monomer 
concentration). *Apparent impurity. Together, aand b show that tetramers 
formed by coexpression of Anca/B4 + Anca incorporate virtually no 
a-subunits. Occupancy from this experiment is shown in Fig. 3b. d, f, nMS of 
unfractionated purified protein complexes of Anca/B5+aand Anca/B14 + a at 
20 uM. Charge series, stoichiometries indicated. Red arrows, peaks isolated for 
further characterization by tandem MS (Extended Data Fig. 11).e, Homology 
model of Anca/B14 + a using Human Hb (1A3N) as template. Yellow and cyan 


Partial pressure of O, (mmHg) 


sticks, AncB-lineage substitutions on IF2; orange sticks, AncB substitutions on 
IF1; yellow surface, aIF2; orange surface, alF1; green, five B substitutions close 
to the interfaces included in Anca/B14 +a. g, nMS of Anca/B2 across 
concentrations. Charge series and stoichiometries indicated. h, Similarity 
between interfaces in Anca/B14 + Anca homology model and X-ray crystal 
structure of Human Hb. Venn diagrams show sites buried at IFl and IF2in one or 
both structures. Small circle, number of shared interface sites with identical 
amino acid state. i, Hydrogen-bond contacts at interfaces in Anca/B14 + « 
homology modelare also found in X-ray crystal structures of extant 
haemoglobins. Residue pairs hydrogen-bonded in Anca/B14 + aIF2 (yellow) 
and IF1 (orange) are listed; +also present in crystal structure; *interactions 
discussed in the main text. PDB identifiers are shown.j, Oxygen equilibrium 
curves of Anca/B14 +a, Anca/B4, Anca/2. All experiments were performed 
once per concentration. Lines, best-fit curves by nonlinear regression. 
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Extended Data Fig. 11| Stoichiometric characterization of Anca/B 
containing historical substitutions. a, SEC of Anca/B5. Circles show 
stoichiometry associated with each peak’s elution volume. b, High-resolution 
accuracy mass spectrometry (HRA-MS) of Anca/B5+ a. Purple circles, peaks 
associated with Anca/B5; pink, Anca. c, HRA-MS of tetramer-containing SEC 
fraction of Anca/B4 + Anca. d, HRA-MS of monomer-containing SEC fraction of 
Anca/B4 + Anca. *922 m/z calibration reference standard. e, HRA-MS of Anca/ 
B9 +Anca. f, nMS of tetramer-containing SEC fraction of Anca/B4 + Anca 


(Fig. 3a, b). Black circle, most abundant peak used for tandem MS. g, Tandem 
MS of isolated most-abundant peak in f, showing trimer-containing peaks. 
Charge states and number of haems (h) in the 8+ peak are indicated. h, 
Monomer-containing (M) peaks. i-k, nMS (i) and tandem MS (j,k) of Anca/ 
B14 + Anca (Fig. 3f) as in f-h. I-n, nMS and tandem MS of Anca/B5+ Anca 

(Fig. 3c, d) as inf-h. Black dots inn mark charge species produced by cleavage 
of Anca/B5. Allexperiments were performed once. 
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Statistics 


For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section. 


n/a | Confirmed 


The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


A description of all covariates tested 


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) 
Lt AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) 


For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted 
Give P values as exact values whenever suitable. 


For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 


For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 


Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


Our web collection on statistics for biologists contains articles on many of the points above. 


Software and code 


Policy information about availability of computer code 


Data collection NCBI BLAST was used to collect sequences from NCBI databases. 


Data analysis As described in the methods section, MAFFT v7 was used to build sequence alignments. PhyML 3.1 was used to infer phylogeny from 
globin alignment. PAML 4.1 was used to infer ancestral sequences using maximum likelihood. PyMOL v1.3 was used to visualize and 
render protein structures. MassHunter was used to perform mass deconvolution on high resolution accuracy mass spec. data. UNIDEC v 
1.0 was used to fit molar masses to and estimate molar abundances from native mass spectrometry. SWISS-MODEL (online server: 
https://swissmodel.expasy.org/) and EMBO PISA v1.48 were used to model protein structures and identify protein-protein contacts. 
Custom scripts were used to perform statistical analyses on Hydrogen deuterium exchange data, fit dissociation constants to Native MS 
data and melting curves to circular dichroism data (see methods). DynamxX 3.0 (Waters) was used to process HDX-MS data. OriginPro 
2016 was used to fit p50s and Hill coefficient parameters to observed oxygen binding data. 


For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. 
We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information. 


Data 


Policy information about availability of data 
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 


- Accession codes, unique identifiers, or web links for publicly available datasets 
- A list of figures that have associated raw data 
- A description of any restrictions on data availability 


Reconstructed ancestral sequences have been deposited in Genbank (IDs TBA). Homology model coordinates have been deposited in the Protein Model Database 
(IDs TBA). Alignment and inferred phylogeny and raw mass spectra have been deposited in Dryad (URL TBA). Scripts for analysis for the HDX permutation analysis 
and identification of contacts between subunits in modeled structures have been deposited at github (https://github.com/JoeThorntonLab/Hb_ evolution). 
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Life sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Sample size Not applicable: experiments were performed on purified stocks of recombinantly expressed proteins. Technical replication of assays is 
described in the manuscript. 
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Data exclusions No data were excluded from the analyses. 


Replication HDX experiments were performed in 3 technical replicates per construct. Measurements of oxygen affinity, cooperativity, and allosteric 
regulation were performed in 3-5 technical replicates per construct. Native mass spectra and size exclusion chromatography were performed 
across multiple concentrations with one measurement per construct/concentration, as described in the manuscript. Error associated with 
replication is reported in the figures and figure legends. 


Randomization Not applicable. The experiments were performed on recombinantly expressed and purified proteins, not on individuals sampled from a 
population and then assigned to groups. 


Blinding Not applicable. The experiments were performed on recombinantly expressed and purified proteins, not on individuals sampled from a 
population and then assigned to groups. 
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Geologist Derya Girer in the remote Anatolian Plateau in Turkey. 


VOYAGES OF 


SELF-DISCOVERY 


Scientists explain how fieldwork in remote areas 
prepared them for a pandemic. By Carrie Arnold 


s many parts of the world continue 
with or reimpose coronavirus lock- 
downs, scientists — like everyone 
else — are feeling the effects of cabin 
fever. But long stretches of fieldwork 
in remote areas and months of working at sea 
have left some researchers more prepared 
than others. Geologist Derya Giirer at the Uni- 
versity of Queensland in Brisbane, Australia, 
says that her experiences during such trips 
have taught her how to cope with the circum- 
stances brought about by the pandemic. 

She and four other researchers who 
shared their fieldwork experiences with 
Nature offered the same advice: cut your- 
self and everyone else some slack. “People 


might be struggling with something that 
you are unaware of,” says marine biologist 
Joana Xavier at the University of Porto in 
Portugal. “It’s very important just to be kind.” 


DERYAGURER 
CONTROL WHAT YOU CAN 


returned to land in early March after spending 
59 days on my first research cruise to map the 
sea floor in the Southern Ocean, and I've been 
in self-quarantine ever since. I have also spent 
time in remote, mountainous regions, usually 
forseveral months ata time, asa field geologist. 
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On this year’s voyage, we had one big storm. 
You're on this huge vessel and you have to trust 
the people who are guiding it. We were watch- 
ing a very skilled crew stabilizing the ship. It 
didn’t make the storm go away, but I focused 
on positive things and on what was within my 
control. That made me look at my worries in 
a different way, almost like an outsider. This 
approach has been helpful while in quarantine. 

I started regularly practising yoga on the 
ship, and, since then, I’ve done it almost daily. 
It has helped me to focus and to control my 
emotions, and to re-evaluate how I am doing 
every day. 

During the entire two months I was on 
board, I watched only one film. I just kept 
busy. I’ve been doing the same thing while in 
isolation because of the coronavirus. Many of 
us are still being told to stay in, and we know 
people are facing hardship and are losing their 
jobs or their lives, but the one thing that I know 
Ican control is my thoughts. 

We have no direct control over what’s 
happening in the world, what governments 
decide or what rules are imposed. I think it is 
important to remember that this is not going 
to be forever. As a scientist, I try to observe 
what’s going on with a bit of curiosity, as well 
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as to observe my own thoughts and emotions 
without judgement. 


Derya Girer is a geologist at the University of 
Queensland, Brisbane, Australia. 


LAURAOMDAHL 
THINK ABOUT OTHERS 


I worked for five seasons — from August to 
February — at McMurdo Station in Antarctica, 
between 2007 and 2012. I started as the 
beverage supply clerk and ended up super- 
vising the stores at all three US bases. 

Atthe McMurdostore, we got one shipment 
per year, usually from a resupply ship that 
came in January or February, so we had to 
make everything last for the rest of the year. 
At McMurdo, all meals are free in the galley. 
So everything in the store was junk food and 
fizzy drinks, souvenirs, extra toiletries, that 
kind of thing. 

Doritos are the most sought-after treats 
down there. Everyone loves Doritos. So | had 
to put limits on them. Someone would want an 
entire case and! would say, “I’msorry, youcan 
have twoto three bagsa day.” Peoplejust don’t 
want to think, ‘Okay, so if eat all the Doritos 
now, | won't have any in January. Or, ifsomeone 
else eats all the Doritos, |won’t have any. Oh, so 
maybe we should share?’ I tried to pick things 
people liked, but obviously we couldn't get 
everything. I'd see people when they first got 
there. They'd be mad that I didn’t have their 
preferred brand of toothpaste. 

When you leave the ice, it’s amazing. You 
go to New Zealand, and it has wonderful pro- 
duce. The first time you go into a market, it 
feels fantastic, because you're thinking, ‘Oh 
my God, an avocado. Look at that salad. Look 
at all the fruit, those strawberries. I still feel 
that way every time I go into a grocery shop. 
It hasn’t left me — even now, with the shelves 
stripped of many goods. Why are people com- 
plaining? I can get a can of tomatoes. It might 
not be the exact can of tomatoes I was looking 
for, but there are some, and we have tons of 
food to eat. 


Laura Omdahl is a former store supervisor at 
the US Antarctic Program, Antarctica. 


CHRIS TURNEY 
KEEP APOSITIVE OUTLOOK 


It’s important not to beat yourself up if you 
don’t achieve as much as you’d like. As part 
of my research group’s work on developing 
highly detailed records of past environmental 
conditions preserved in tree rings, ice cores 
and sequences of peat and lake sediment, I’ve 
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spent time in the Antarctic. We talk about the 
A-factor, whichis short for Antarctic factor. 

Ifanything can go wrong, it will. COVID-19 is 
the extreme version of the A-factor. And weall 
have families and other commitments outside 
work. So accept that not everything you have 
planned will happen. And if you can’t work as 
much as you’d like to, that’s all right. Almost 
everyone in the research community is in the 
same position and it’s more important to keep 
a positive mindset and look out for everyone 
else than to disappear into a frenzy of work. 

On a research field trip, keeping focused 
on getting everyone home safely is a power- 
ful motivator. And this means looking to the 
future. A positive outlook is crucial in this 
regard. I’ve seen people reflect too much 
on their predicament, which distracts them 
from finding a solution. You don’t want people 
withdrawing into themselves, especially if it 
affects anyone else. Keep regular communi- 
cation open and make sure team members are 
working together for the future. Even if you 
can’t find a solution, preparing for whatever 
comes next and beyond helps people’s mental 
health enormously. 

We’re social animals and need contact. 
Social distancing might mean we have to 
keep a physical distance from one another, 
but we can use technology to keep the lines 
of communication open with family, friends 
and colleagues. Mix it up and you'll stay sane 
and achieve more in the long term. 


Chris Turney is a geoscientist and explorer at 
the University of New South Wales, Sydney, 
Australia. 


JOANA XAVIER 
MAINTAIN AROUTINE 


We've been in quarantine since early March, 
and my partner and I think that it’s important 
to keep aroutine. I learnt about the importance 
of doing so when I started joining research 
cruises in 2010 as part of my work on marine 
sponges. The cruises involved relatively large 
vessels travelling across the North Atlantic, 
in the Nordic seas and across the Arctic mid- 
ocean ridge. 

On board, we worked in shifts, usually 
12 hours on, 12 hours off. We always had a 
routine. Not only did that help us to work effi- 
ciently in a small, confined space, but it also 
helped us to unwind when our work was fin- 
ished. Whether it was a film night or a quiet cup 
of coffee first thing in the morning, the rituals 
kept us anchored while we were at sea. When 
we began to self-isolate after coronavirus 
struck, I used the same strategy without even 
thinking about it. 

From the first week, because we realized the 
lockdown was going to last much longer than 
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we originally expected, we created aschedule 
with our children, who are five and seven. We 
agreed that the routine needed to include 
everything from playtime to study time, arts 
and crafts, and home tasks such as cooking 
and cleaning up. We asked the children if they 
thought the timetable would work for them 
and we put everyone’s schedule up on the 
living-room wall. We try to stick to it as much 
as possible, because this also allows me to 
work from home. 

Itis working quite well. We make sure we get 
enough rest — our moods depend alot on that. 
Managing our energy is important. I learnt this 
on board and it’s true now at home: we need 
our resting time and our funtime. 


Joana Xavier is a marine biologist at the 
University of Porto, Portugal. 


RACHEL DOWNEY 
STEP AWAY FROM WORK 


I did my first research cruise as a physical 
geographer in 2013 as part of the British 
Antarctic Survey, and I’ve done three more 
since then. The longest was for seven weeks. 

One of the most important things I’ve learnt 
is that you need to take the time to understand 
your needs — both work and personal. I have 
to know when to put the lid on, when to stop 
working. On some ships, we would finish our 
shift and all have a drink, just to separate the 
day. The ship was our place of work and we 
never got to leave it, but the ritual marked the 
end of work and then we could play darts or 
games. We had things like film nights to look 
forward to. We found ways to make the ship 
work for work and for play. 

NowthatI’mat home, I’m really thankful that 
we live next toa bush reserve and that we have 
agarden. It’s so nice to get out there after work, 
with all the soil and the plants. My partner is 
learning to play the guitar through YouTube. 
It’s so easy for both of us to just keep working, 
butI’ve learnt to make myself stop. I try to keep 
an office space for work in the dining room and 
then walk away. 

This is an unprecedented time, and it’s easy 
to be hard on each other and ourselves and 
say, ‘Oh, I must work twice as hard, or so many 
hours, or I must achieve this. You see all these 
memes about all the incredible things scien- 
tists did during the Black Death. I think it’s a 
good time for us to sit back and reflect on our 
work, and to look at where we are and what 
direction we want to goin. 


Rachel Downey is a marine biologist at the 
Australian National University, Canberra. 


Interviews by Carrie Arnold. Interviews have 
been edited for length and clarity. 
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A genome-scale metabolic model of yeast. Each coloured sphere represents a substance that the organism uses in metabolism. 


THE MICROBIOME 


MODELLERS 


Computational approaches can reconstruct the 
complex interactions between gut bacteria and 
their human hosts. By Michael Eisenstein 


here is something comforting in the 

elegance of a chemical reaction. Inputs 

and conditions on one side of the 

reaction predictably yield a defined set 

of products on the other. But this pre- 
dictability is quickly lost in complex biological 
systems, where thousands of reactions occur 
in parallel among vast numbers of interacting 
cells. 

Consider the human gut microbiome, in 
which roughly 1,000 bacterial species compete 
and collaborate while communicating with 
their host. This crosstalk rapidly becomes too 
complex to capture in a diagram. But under- 
standing these communities is crucial, because 
their biological activity directly affects our 
health and susceptibility to disease. 

Now, systems biologists are building models 
toilluminate these black boxes. “The old-school 
way of looking at hundreds of thousands of 


reactions is just not feasible, and not desirable 
either,” says Ines Thiele, a microbiome 
researcher at the National University of Ireland 
in Galway. “But we have techniques nowthattry 
to help us see what’s happening much quicker 
andtacklethe emergent complexity that arises.” 

Thiele and her colleagues are developing 
mathematical and statistical approaches 
that use a range of data, from areas such as 
genomics and biochemistry, to reconstruct 
highly diverse gut microbial communities com- 
putationally. They capture only a fraction of the 
complex biological reality of the microbiome, 
but can reveal interactions between microbes 
and their hosts that would be near-impossible 
to detect otherwise. 


Polished GEMs 


For many modelling efforts, the starting point 
is a genome-scale metabolic model (GEM). 
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Constructed by scanning an organism’s 
genome to determine the biochemical 
processes it can perform, GEMs are essentially 
blueprints of the enzymatic assembly lines in 
every microbe. “They're capturing all of the 
metabolic capabilities that the cell has,’ says 
Jens Nielsen, a systems biologist at Chalmers 
University of Technology in Gothenburg, 
Sweden. Researchers can then mathematically 
model howthese inputs and outputs feed into 
one another. 

This approach is particularly powerful for 
studying the gut, the microbes of which are 
often difficult to cultivate in the laboratory. 
Researchers can also draw on existing 
biological knowledge to extrapolate the 
possible function of a gene using sequence 
similarities with known enzymes, for instance. 
Even sparse information can be useful. 
Systems microbiologist Karsten Zengler at 
the University of California, San Diego, and his 
colleagues, for example, developed a GEM for 
a relatively under-studied species of marine 
diatom (a type of plankton), even though 
functions had been identified for only about 
one-tenth of its genome’. “Surprisingly, it 
worked,” Zengler says. “We have enough infor- 
mation about metabolism to understand what 
they’re doing.” His team’s modelling approach 
assigned functions to more than1,000 diatom 
genes, which collectively perform nearly 
4,500 interconnected biochemical reactions, 
and predicted where in the cells these 
reactions are likely to occur. 
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Currently, most microbiome samples are 
studied by isolating total DNA from a microbial 
community and breaking it into small pieces 
for sequencing, a strategy called shotgun 
metagenomics. The resulting data sets can 
provide an inventory of the species present in 
asample without needing to culture or isolate 
them. Researchers can combine GEMs to flesh 
out larger microbial community models, and 
identify patterns of enzymatic activity on the 
basis of genes uncovered in the data. They can 
then thread these processes together to under- 
stand which chemicals the particular system is 
taking up, which products are being released, 
and which cells are interacting to drive these 
processes. “You get non-obvious relationships 
between the microbes that you would just 
not see by looking at genome sequences 
themselves,” says Thiele. 

In one study’, Zengler’s group modelled 
the dynamic interplay between the alga 
Chlorella vulgaris and the yeast Saccharomyces 
cerevisiae, uncovering unexpectedly high levels 
of exchange between the two species for certain 
amino acids. Experimental profiling of metab- 
olites inthe sample might have overlooked this 
exchange, says Zengler, who offers the analogy 
of trying to infer children’s snack preferences 
by what’s left onthe table after a birthday party. 
“You might observe the table and say, ‘Look at 
allthose apples — that’s what children must like 
to eat. But that is not the case.” 

Not all interactions can be modelled: it’s 
currently impractical to capture the full 
interplay between all reactions from every cell 
at the same time. Asa result, these models are 
often best suited for testing existing hypoth- 
eses, Says Joao Xavier, a systems biologist at 
the Memorial Sloan Kettering Cancer Center 
in New York City. “For example, we have several 
hypotheses about the role of fermentative 
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bacteria that produce short-chain fatty-acid 
compounds, such as butyrate, that are 
associated with intestinal health,” he says. “So 
we could try to look for genes that belong to 
those pathways.” Xavier’s approach draws on 
principles used in conventional ecological 
modelling — for example, equations used 
for studying predator-prey interactions 
that capture how changes in one population 
can affect another. His group then uses 
machine-learning techniques to model these 
effects in more complex microbial systems. 
But even seemingly simple interactions 
can hide complexity. Consider the straight- 
forward example of one microbe producing 
a carbohydrate that another microbe likes 


“Even seemingly simple 
interactions can hide 
complexity.’ 


to eat. Changes in both the composition ofa 
community and its environmental conditions 
can influence interactions between species, 
Zengler notes, citing his yeast-algae study. “In 
one condition, they loved each other and were 
best buddies and exchanged everything and 
grew happily together — better than they did by 
themselves,” he says. But something as simple 
as changing ammonia levels could make them 
more adversarial. “This relationship just got 
worse and worse, and at one point they ended 
up having a ‘divorce’ and fighting over their 
belongings.” 


Just part of the picture 

Models can reliably draw such inferences only 
if they have good underlying data. But those 
can be difficult to get. The gene databases 


A map of the complex pathways in human metabolism (carbohydrate pathway in red). 
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that researchers use to deduce enzymatic 
function and other biological activities are 
still incomplete, and even well-characterized 
species can hold surprises. In an effort to 
improve his group’s metabolic models, Nielsen 
tested how well S. cerevisiae grew on different 
carbon-based nutrients’. “We found it used 
many carbon sources that we were not taking 
into account even in our most state-of-the-art 
models,” he says. 

Models of the microbiome inthe human gut 
might also overlook crucial aspects of the host 
environment, such as the physical structure 
of the large intestine — a lengthy organ with 
a bacterial composition that varies from one 
section to the next. “The geometry of the gut 
is quite complicated”, as are the mechanics 
of faecal matter travelling through it, says 
Xavier. There is also extensive communication 
between the host and microbiome, both atthe 
intestinal barrier and through chemical signals 
that the intestinal cells subsequently relay 
throughout the body. 

Researchers can indirectly measure the 
impact of such interactions using samples of 
blood, urine or cerebrospinal fluid. Thiele has 
been developing a ‘virtual metabolic human’ 
that can integrate these and other data with 
microbiome models to create a more holistic 
picture of these interactions in various disease 
states. “We now have about 30 organs or 
tissues, arranged in an anatomically accurate 
manner,’ she says. Her group is using those 
models to search for microbial perturbations 
that might contribute to disorders such as 
Parkinson’s disease. 

Xavier’s group is collaborating with clinicians 
to understand how antibiotics affect the gut 
microbiomes of bone-marrow-transplant 
recipients, whose immune systems are 
suppressed to minimize rejection of the 
transplant. They found that the resulting 
microbial disruptions can markedly affect the 
function of the immune system after the trans- 
plant, and that supplementing individuals with 
healthy microbial populations could improve 
their recovery*. 

Elhanan Borenstein, a systems biologist 
at Tel Aviv University in Israel, says, “People 
may underestimate the importance of 
modelling as a tool. It’s not just to simulate a 
real environment, but also a way to understand 
first principles.” Armed with these fundamen- 
tals, researchers might ultimately be able to 
identify targeted microbiome interventions 
that meaningfully change clinical outcomes, 
even if much of the system remains a black box. 


Michael Eisenstein is a freelance writer based 
in Philadelphia, Pennsylvania. 
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hen you’re collecting sea-floor 
creatures in the abyss off the 
coast of Western Australia, 
just about every find is worth 
acloser look. This beautiful 
crustacean is a squat lobster — most likely 
Galacantha rostrata is my best guess. A 
remotely operated vehicle (ROV) spotted 
it ona bare rock 2.5 kilometres deep in the 
Ningaloo Canyon System and sucked the 
creature up using its slurp gun. 

Iwas one of about a dozen scientists 
aboard the research vessel Falkor fora 
month-long biodiversity survey that was 
upended by the coronavirus. We left in early 
March, and were supposed to make a port 
call in Exmouth halfway through to swap 
researchers, but that plan got scrapped 
because of the pandemic. The original 
crew finished the expedition and collected 
30 new species — including swimming 
worms and heavily armoured barnacles — 
from depths exceeding 4 kilometres. 

The squat lobster didn’t survive the trip to 
the surface, so before it starts degrading, I’m 
carefully photographing its features with an 
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inexpensive Olympus camera. Sophisticated 
photography equipment can really eat into 
a research budget, and this workhorse does 
the job. Many of the photos will be displayed 
at the newly renovated Western Australian 
Museum in Perth, where I’m curator of 
crustaceans and worms, when it reopens. 
We saw some wildly unexpected things 
on this expedition. There was a metre-tall 
hydrozoan, related to jellyfish, that stood 
like a giant flower above the ocean floor. 
Andas the ROV returned to the surface 
one evening, it passed by a 45-metre- 
long siphonophore, a string-like colonial 
organism that is possibly the longest 
creature ever recorded. The scientists were 
eating dinner in the galley when the animal 
appeared on the live video stream. It was an 
amazing and totally unplanned encounter. 
Down there, you never know what you'll 
find next. 


Andrew Hosie is curator at the Western 
Australian Museum in Perth, Australia, and a 
zoology PhD student at Curtin University in 
Perth. Interview by Chris Woolston. 
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fthetop 10 countries inthe Nature Index, South Koreais holding 
its footing better than most amid the tsunamiof research from 
China, despite its rank falling from 8th to 9th place in 2016 —a 
position it retains. 

First, some terms: Share, our key metric, measures contribu- 
tions to articles in the 82 selected natural-sciences journals inthe Index, 
based on the proportional contribution to each article of authors by 
country (or institution). Adjusted Share accounts for small changes in 
the number of Index-journal articles over time. 

Nowto South Korea’s performance: China’s gain of 63.5% in adjusted 
Share in the Index from 2015 to 2019 translates to declines for other 
countries. It’s a zero-sum equation: the Index journals publish a finite 
number of articles. But among the top 10 countries other than China, 
South Korea, witha mere 6.4% decline in adjusted Share over the past four 
years, fares better than all but Switzerland (+2.1%) and Australia (-3.8%). 

Some of South Korea’s research resilience may be due to the fact that 
it achieved much greater growth in collaboration with China than it did 
with any other top 10 country. Its bilateral Collaboration Score with 
China (the sum of Shares on articles with authors from both countries) 
has increased by 140% since 2015. In 2018, China displaced Japan as 
South Korea’s second-most collaborative partner in the Index, after 
the United States. 

South Korean science is poised for further change. Increased com- 
petition from China, plus the fallout from the COVID-19 pandemic, 
pose challenges. Many hope these will be addressed by a new deal for 
science and technology comparable to President Moon’s Green New 
Deal re-election manifesto, which promised to reduce South Korea’s 
net carbon emissions to zero by 2050. 

This year, as the first centres of the government’s flagship Institute 
for Basic Science come up for review, decisions about which centres 
to keep, close or open will further point to how South Korea plans to 
forge its own scientific path. It’s a long road between making a pledge 
and achieving an ambition, but the creativity and determination it has 
demonstrated by its post-war transformation into a global leader in 
innovation, as detailed inthis supplement, augur well for further success. 


Catherine Armitage 
Chief editor 
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Starting small to 
Spark creativity 


South Korea’s flowing research funds are 
being redirected to basic-research grants 
to raise competitiveness by encouraging 
original discoveries. By Mark Zastrow 


henit comes to funding research, 

South Korea is in rarefied air. As a 

percentage of its gross domestic 

product, its spending on research 

and development is second only 
to Israel's. 

The speed of this ascent has been dizzying, 
up from 2.1% of its gross domestic product 
in 2000, when it was equal to the average of 
Organisation for Economic Co-operation and 
Development member nations, to more than 
4.5% in 2018, with Israel alone surpassing it at 
4.9%, according to the latest available figures. 

But this surge has not been even. Rather, it 
has favoured top-down, government-directed 
projects intended to boost competitiveness in 
fields such as artificial intelligence, robotics 
and materials, often in partnership with the 
private sector, consistent with the post-war 
focus on applied research that turned South 
Korea into a leader in semiconductor manufac- 
turing and wireless communication networks. 

New priorities, however, have taken root. 
For the past decade, the necessity to invest in 
basic research has beenacommonrefrain from 
South Korean researchers and politicians alike. 
To maintain economic growth, the nation has 
sought to become a ‘first mover’, instead of sim- 
ply a ‘fast follower’ — rhetoric embraced bythe 
current liberal president, MoonJae-in, and his 
predecessor, the conservative Park Geun-hye. 
The drive to compete more effectively in strate- 
gic fields continues, but South Korea has begun 
emphasizing bottom-up funding for small 
teams of researchers, betting that strength will 
flow from the creativity of its scientists. 


Spending surge 

On paper, the rhetoric is backed by funding. 
Under President Moon, who was first elected 
in 2017, the basic-research budget for South 
Korea’s main funding agency, the National 
Research Foundation (NRF), has skyrocketed 
under a five-year plan to double it by 2022 to 
2.5 trillion won (US$2 billion). The nation’s 
budget for 2020 allows for a remarkable 18% 
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increase fromthe previous year in government 
spending on research and development (R&D), 
to 24.2 trillion won. 

Inthe private sector, which makes up roughly 
three-quarters of R&D spending, basic-research 
spending is also surging, including outlays by 
major conglomerates such as Samsung and 
LG Electronics. “Industry is asking universi- 
ties to produce more and more PhDs,” says Han 
Woong Yeom, a physicist at Pohang University 
of Science and Technology and vice-chair of 
Moon’s presidential advisory council onscience 
and technology. “The situation cannot be better 
than this, in terms of investment.” 

Spending surged over the past two decades 
on large national projects, such as a lunar 
orbiter scheduled for launch in 2022, a nano- 
materials research effort dubbed the ‘Crea- 
tive Materials Discovery Program’, and Korea’s 
branch of the International Thermonuclear 
Experimental Reactor, the main experiment 
of which is under construction in southern 
France. “Previously, more emphasis was on 
the mission-oriented research towards imme- 
diate economic output,’ says microbiologist, 
Jung-Hye Roe, whom Moontapped to lead the 
NRF in 2018. 


Shift in emphasis 


There is anew emphasis on bottom-up fund- 
ing through NRF grants. Currently, the NRF 
accounts for slightly more than one-quarter 
of total government-funded research, witha 
budget of close to 7 trillion won in 2020. The 
amount spent on basic-research grants to 
principal investigators is now roughly equal 
to that spent on top-down, mission-oriented 
projects, and Roe hopes to boost the basic- 
research grants still further. 

Recent highlights from these grants include: 
agallium arsenide nanoresonator that can trap 
light and change its colour, work published in 
Science’ in January and co-led by Hong-Gyu 
Park of Korea University in Seoul; atmospheric 
observations from Gosan-ri on Jeju Island as 
part of an international study’, co-led by 
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LG's Guide Robot at Incheon International Airport recognizes four languages. 
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HOLDING STEADY—SELECTED COUNTRIES 


In 2016 South Korea dropped from 8th to 9th among the top countries by Nature 
Index Share, and has retained 9th spot ever since, between Switzerland (8th in 
2019) and Australia (10th). Japan is 5th; United Kingdom is 4th. 
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How much of each country’s total Nature Index Share is taken up by each broad 
subject area. The physical sciences account for a greater proportion of Share in 
South Korea than any other subject. 
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“When comparing data over time, Share values are adjusted to 2019 levels to account for the small annual variation in the total number of articles in the Nature Index journals. 


Sunyoung Park of Kyungpook National Uni- 
versity in Daegu and published in Nature in 
2019, that identified eastern China as a major 
source of the emission of chlorofluorocar- 
bons, chemicals that destroy the ozone layer 
that are banned under the Montreal Protocol; 
and the finding that the development of osteo- 
arthritis is related to how the body processes 
cholesterol, in a 2019 Nature study’, led by 
Je-Hwang Ryu of Chonnam National University 
in Gwangju andJang-Soo Chun of the Gwangju 
Institute of Science and Technology. 

The top-down division can claim successes 
too, including playing a key role inthe nation’s 
ability to quickly develop and produce diag- 
nostic kits for COVID-19, says Roe. The first four 
companies to win urgent-use approval for their 
tests had all received NRF funding to research 
the production of such kits after the 2015 Mid- 
dle East Respiratory Syndrome (MERS) out- 
break in South Korea infected 186 people and 
killed 36. Andin February, the Korea Aerospace 
Research Institute, responsible for the nation’s 
space programme, launched the Cheollian 2B 
satellite; its main instrument, the Geostation- 
ary Environment Monitoring Spectrometer 
(GEMS) is designed to monitor air pollution 
in the Asia-Pacific region, making it the first 
of anew generation of satellites intended to 
study global air quality. 

An emphasis on government-mandated 
metrics in hiring and promotion, which critics 
say incentivizes researchers to present work 
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of little value in low-quality journals or confer- 
ences, remains to be addressed. Last May, the 
nation’s education ministry reported that 574 
researchers had participated in international 
conferences organized by ‘predatory publish- 
ers’. “That happens because, so far, we have 
emphasized getting large numbers of quanti- 
tative, shallow publications,” says Roe. 


“This old-style culture may 
stillbe good enoughto 
produce many papers, but 
not original ideas.” 


“1 think this old-style culture is prohibiting 
the booming of our creativity,” says Yeom. 
“This may still be good enough to produce 
many papers, but not original ideas.” 

The outlook is uncertain. Not only does the 
expected heavy economic burden of recover- 
ing from the COVID-19 pandemic threaten to 
curb the growth in R&D budgets, but South 
Korea’s traditional industries — including 
chip-making, ship-making and nuclear power 
— face increased competition from China. 
Political tensions withJapan that flared inJuly 
2019, fuelled by that nation’s legacy of colonial 
rule, threatened to cut off supplies of key mate- 
rials for semiconductor production, exposing 
how heavily the nation’s technology industry 
relies onJapanese manufacturers. 
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There is talk in policy circles of ascience and 
technology new deal — a boost in funding and 
programmes to combat the effects ofaless glo- 
balized world in which South Korea’s access to 
scientific supplies and talent is reduced, says 
So Young Kim, director of the Korea Policy 
Center for the Fourth Industrial Revolution, 
a think tank affiliated with KAIST in Daejeon. 
Yeom says researchers stand to benefit from 
a public appetite for societal change, which 
also translates into support for increased sci- 
ence funding. Yeom and other policymakers 
are pushing for greater investment in research 
topics that meet the practical needs of South 
Korean society, such as air pollution, and an 
ageing population. In response, the Moon 
administration has begun an initiative to 
trace and reduce fine particulate matter across 
northeast Asia, and another to fight dementia. 

Some suspect it will be initiatives such as 
these that do most to move South Korea away 
from the label ‘fast follower’. “The definition of 
top-level research doesn’t need to be the high- 
est impact factor or highest citation count,” 
says Roe. Pursuing research directions that 
benefit South Korean society is “making our 
own path, anew path, rather than following 
others’. 


Mark Zastrow is a science writer based in Seoul. 
1. Kosheley, K. et al. Science 367, 288-292 (2020). 
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National vision 


The vast IBS network signals South Korea’s ambitions, but research focus 
may be adjusted after centres face scheduled scrutiny. By Mark Zastrow 


outh Korea’s highest-profile invest- 

ment in basic research is in the Insti- 

tute for Basic Science (IBS), anetwork 

of research centres that comprises the 

nation’s answer to Germany’s Max 
Planck Society and Japan’s RIKEN. 

Founded in 2012 after a 2007 promise by 
conservative, Lee Myung-bak, during his suc- 
cessful campaign forthe country’s presidency, 
the institute sought to attract elite researchers 
from at home and abroad by giving successful 
individual applicants, as centre directors, full 
autonomy over their centre’s research anda 
budget of 10 billion won (US$8.2 billion) per 
year for ten years. 

The centres cover areas including dark 
matter, nanomaterials, genome engineering 
and climate change. But, after criticism that 
they were taking money away from the greater 
scientific community, the goal of 50 centres 
has been reduced to 30 for now, with average 
budgets trimmed to about 6 billion won. 

IBS president, Noh Do Young, warns that the 
cuts are “likely to inhibit IBS’s efforts to ful- 
fil its missions and visions”, arguing that the 
budgets are not large, considering centres can 
employ dozens of researchers each. 

Many IBS researchers say the level of funding 
has helped their research reach new heights. 
For instance, in March 2020, the Center for 
RNA Research at Seoul National University 
used cutting-edge sequencers purchased with 
IBS funds to become one of the first research 
groups inthe world to sequence thetranscrip- 
tome — the total product ofall expressed genes 
— of the coronavirus that caused the COVID- 
19 pandemic’. The sequencers can directly 
identify the molecules in RNA sequences, 
as well as where they are modified by other 
molecules. Narry Kim, a biochemist and the 
centre’s director, says the work required an 
interdisciplinary team of virologists, microbi- 
ologists and computational scientists, which 
was enabled by IBS’s steady support. “Without 
IBS funding, I don’t think it would have been 
possible,” she says. 

Among other prominent IBS results, the 
Center for Genome Engineering supplied 
the gene-editing tools in a blockbuster 2017 
study’in Nature that claimed the CRISPR-Cas9 
gene-editing system was used to correct a 
mutation that causes heart disease in viable 


human embryos by replacing the mutated 
copy from the sperm with the correct gene 
from the egg. The results have been hotly 
debated, as some scientists argue there could 
be alternative explanations. 

In 2018, results reported in Nature’ by the 
Center for Underground Physics, from its lab- 
oratory ina subterranean power plant in Yang- 
yang County near the country’s east coast, 
ascribed constraints to the theorized particles 
that could make up dark matter. And in2019,a 


“If cannot build atop- 
class research centrein 
eight years, either I’m not 
the right person for the job, 
orit’s not possible in 

that environment.” 


Narry Kim, director of the Center for RNA 
Research at Seoul National University. 
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collaboration including the Center for Climate 
Physics in Busan proposed in Nature’ that the 
first ancestors of modern humans came froma 
prehistoric wetland in modern-day Botswana, 
roughly 200,000 years ago. The work com- 
bined the centre’s supercomputer climate 
models witha genetic analysis of populations 
in Namibia and South Africa. 


Political strife 


IBS came under heavy scrutiny in 2019, when 
reports surfaced in South Korean media of 
misappropriated funds and nepotism in 
hiring, which had allegedly been uncovered 
by government audits, at several centres. 
Doochul Kim, a physicist who was president 
of IBS at the time, said these were mostly 
the result of administrative errors and that 
the audits were politically motivated by 
liberal lawmakers. In late 2019, Kim announced 
reforms to IBS’s administrative structure, to 
be carried out under his successor, Noh, who 
started in November 2019. 

The year 2020 marks another significant 
milestone for IBS: the first group of centres 
are coming up for review. Unlike at the Max 
Planck Society, centre directors at IBS are 
not awarded a lifetime appointment, and 
centres are subject to a make-or-break review 
after eight years to determine if they will be 
extended beyond their original ten-year remit. 
“This review session will shape the future of 
IBS,” says Noh. He says centres may close as 
aresult, although new centres may take their 
place. (Noh adds that the process may be 
delayed by the COVID-19 pandemic, if crucial 
on-site visits by international reviewers, sched- 
uled for July, cannot go ahead.) 

The model has its advocates. “I like that 
pressure,” says Andreas Heinrich, who left 
IBM Almaden in San Jose, California, to lead 
the IBS Center for Quantum Nanoscience in 
Seoul. “If I cannot build a top-class research 
centre in eight years, either I’m not the right 
person for the job, or it’s not possible in that 
environment.” 


Mark Zastrow is a science writer based in Seoul. 
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South Korea is a global leader in information and communication technologies. 


Atop-down 
reinvention 


Aconcerted government push to make South Korea an innovation 
leader, backed by strong investment and systemic reform, has 
brought rapid and long-lasting results. By Leigh Dayton 


outh Korea’s position as one of the 

world’s most innovative nations is a 

remarkable achievement considering 

that, for the first half of the twentieth 

century, it was an agrarian-based Jap- 
anese colony, then a battle ground. 

It is second only to Germany in Bloom- 
berg’s 2020 Innovation Index, having reigned 
at the top of the 60-country list for the pre- 
vious 5 years. In the separate 2019 Global 
Innovation Index, published by Cornell Uni- 
versity, INSEAD and the World Intellectual 
Property Organization, South Korea is at 
number 11 and Germany is in 9th place among 
the 129 countries ranked. 
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Both indices highlight South Korea’s out- 
standing performance in research and devel- 
opment (R&D) intensity, an indicator based on 
R&D investment by government and industry 
and the number of researchers working in and 
between both sectors. For example, South 
Korea had the greatest share of researchers 
who moved from industry to academia in 2017 
to 2019 among 71 countries, data from aca- 
demic recruitment firm, League of Scholars, 
show. 


Top-down success 


The high R&D intensity that helped South 
Korea become a global leader in information 
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and communication technologies has 
emerged froma historically ‘top-down’ inno- 
vation system that promotes “close collabo- 
ration between government, industry, and 
the academic community in the process of 
nation building”, says Tim Mazzarol from 
the University of Western Australia in Perth, 
who specializes in innovation and entre- 
preneurship. 

President Park Chung-hee drove South 
Korea’s economic development between 1961, 
when he took power ina military coup, until 
1979, when he was assassinated. Park shifted 
the economy from its post-war dependence 
on technology imports and the construction 
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SAMSUNG'S TOP TEN 


Samsung Group's top ten collaborating academic partners on articles in the Nature Index journals are split between 
United States and domestic institutions. Here they are ranked by bilateral collaboration score (CS), 2015-19. 
CS is derived by summing each institution’s Share on the papers to which authors from both have contributed. 


Rank _ Institution Country Bilateral CS Count* 
1 Sungkyunkwan University South Korea 75.07 159 
2 Seoul National University South Korea 21.10 Al 
3 Korea Advanced Institute of South Korea 20.16 35 
Science and Technology 
4 Stanford University United States 19.29 31 
5 University of California, Berkeley United States 1716 51 
6 Korea University South Korea 13.62 27 
7 Yonsei University South Korea 11.07 22 
8 Harvard University United States 9.67 26 
9 Pohang University of Science South Korea 8.82 16 
and Technology 
10 California Institute of Technology United States 8.35 12 
*Count = Article count 
SECOND TO ONE e 
fa) 
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(China) and $551.5 billion 9 
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of industrial facilities by foreign companies to 
focus on home-grown labour-intensive indus- 
tries, such as clothing and textiles. Crucially, 
strong support for R&D was central to his first 
Five-Year Economic Development Plan in 1962 
and manifest in his establishment of the Korea 
Institute of Science and Technology (KIST) in 
1966, and the Ministry of Science and Technol- 
ogy the following year. 

These instruments supported the emer- 
gence of large industrial groups called chae- 
bols, which were owned and controlled by 
South Korean individuals or families. The 
government pushed the chaebols to invest 
heavily in R&D while shielding them from 
competition. With increased R&D intensity 
that focused on applied knowledge, chaebols 
such as LG, Lotte and Samsung were driven 
towards new heavy industries, including 


petrochemicals, car manufacturing and 
shipbuilding, as well as consumer electronics. 


Samsung — the classic chaebol 


Samsung is acase in point. The company that 
started life as a grocery trader in 1938 is now 
South Korea’s largest chaebol, operating in 
industries as diverse as electronics, insurance, 
construction and shipbuilding. In 2018, it 
produced roughly 15% of the nation’s gross 
domestic product. 

Its founder, Lee Byung Chul, with help from 
government protectionist policies, expanded 
into textiles after the Korean War, electronics 
inthe 1960s, then heavy industries, aerospace 
and computing during the 1970s and early 
1980s. By the 1990s and 2000s, Samsung was 
aworld leader intablets and mobiles, andinthe 
design and manufacture of computer chips. 
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The company is South Korea’s leading corpo- 
rate institution in the Nature Index by far, based 
oncontributions to research articles published 
inthe 82 high-quality natural science journals 
tracked by the Index. With a Share of 10.36 in 
2019, it ranked 28th among the country’s insti- 
tutions overall, eclipsing its nearest rival inthe 
corporate ranks, LG, which had a Share of 1.99. 
Samsung also features in each of South Korea’s 
nine leading corporate-academic collaborative 
pairs in the Nature Index. 

The most productive pairing is with 
Sungkyunkwan University (SKKU) in Seoul, 
with 159 joint articles between 2015 and 2019. 
Their collaboration is particularly strong in 
electrochemistry and the development of new 
energy sources such as lithium-ion batteries 
(J. K. Shon et al. Nature Commun. 7, 11049; 
2016). Other partnerships include Seoul 
National University in Seoul, (41 articles) and 
the Korea Advanced Institute of Science and 
Technology (KAIST) in Daejeon (35 articles). 


Investing in the future 


Park’s successors continued to promote 
research and innovation as the driver of 
national economic and social advance. Gov- 
ernment and industry investment in R&D 
soared, and basic-research capabilities were 
expanded. By the mid-1980s and early 1990s 
the government’s attention had shifted to 
high-tech industries such as semiconductor 
design and manufacture. For instance, in1981 
it founded KAIST, which remains a leading 
national research university (see ‘Manipulat- 
ing brains with smartphones’). 

Targeted nation-building programmes were 
also established. 1n1995, for example, the gov- 
ernment beganaUS$1.5-billion, ten-year plan 
to build up the national broadband infrastruc- 
ture and provide public programmes about 
maximizing its use. 

The 1997 Asian Financial Crisis prompted 
many chaebols to shift from the reliance on 
low-value added exports characteristic of 
a ‘tiger’ economy towards technology and 
knowledge-intensive products and services 
such as semiconductors, mobile phones and 
mobile applications. 

Working with chaebols, the government 
began developing regional innovation cen- 
tres such as Gyeonggi, an area of nearly 
13 million people surrounding Seoul, which 
isnowregarded as the nation’s economic and 
innovative powerhouse. 

The centre brought industry R&D and pro- 
duction infrastructure together with local and 
national universities and research facilities. 
For instance, the Gyeonggi-based Samsung 
Electronics, Samsung’s flagship subsidiary, is 
collaborating with SKKU Chemistry to develop 
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asemiconductor material that can reduce the 
amount of radiation exposure while taking 
medical X-ray images. By 2010, South Korea 
had 105 regional innovation centres and 18 
techno-parks, as well as 7 federal programmes 
to strengthen the competitiveness of indus- 
trial cluster programmes. 

Although government funding continued 
to promote R&D spending and programmes 
to boost translational development and scien- 
tific, engineering and managerial expertise, the 
weight of major investment in R&D shifted to 
the corporate sector in search of patents and 
profits. Private R&D spending accounted for 
nearly 80% of South Korea’s total R&D spend- 
ingin2019, ahead of leading innovative nations 
such as Germany, Sweden and Switzerland, at 
70%. The shift was supported by R&D tax incen- 
tives and importation of foreign technology. 


The new breed 


In the 2010s, small to medium-sized busi- 
nesses in biotechnology, artificial intelligence 
and cybersecurity, and broadband-based firms 
began to emerge. Founded by a new genera- 
tion of entrepreneurs, they were backed by 
government funding and supported by the 
national technological infrastructure. 

Woowa Brothers is one example, among 
many, of the strategy’s success. The Seoul- 
based 2010 start-up exploited the national 
broadband to build a mobile food-delivery 
application connecting restaurants, custom- 
ers and riders. 

In December 2018, Woowa joined the ‘uni- 
corn’ club —arare status denoting a privately 
held start-up valued at more than US$1 billion 
— with investment from national and interna- 
tional venture-capital sources. In December 
2019, Germany’s Delivery Hero bought the 
company in a $4-billion deal that will see 
co-founder and chief executive, Kim Bong]Jin, 
manage the Asian business, including South 
Korea, Vietnam and Hong Kong. A delivery 
robot, self-driving technology and an online 
customer and revenue-management system 
for restaurants are in development. 

The South Korean government's systematic 
approach has been the crucial factor in creat- 
ing an innovative economy adept at turning 
ideas from laboratories into products and 
industries. Martin Hemmert, an expert in east 
Asian innovation systems, at Korea University, 
adds that the cultural mindset evidentin South 
Korea helps. “Complacency is not onthe cards. 
The glass is always half empty,” he says. 

Evenso, as Mazzarol concludes: “It’s a mira- 
cle when you consider where Korea was.” 


Leigh Dayton is science and innovation writer 
based in Sydney, Australia. 
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MANIPULATING BRAINS 
WITH SMARTPHONES 


A team of researchers at KAIST — not to be 
confused with the KIST, with which it was 
initially integrated — has fulfilled a dream of 
neuroscientists worldwide. 

Working with colleagues at the University 
of Washington in Seattle, they have built a 
novel device capable of remotely controlling 
the brain circuitry of mice, via a smartphone. 
It is the first wireless neural device that 
can continuously deliver multiple drugs 
and coloured light beams to control brain 
circuits. Until now, researchers needed rigid 
metal tubes and optical fibres to accomplish 
the task. 

The device could speed up the study of 


TALENT SWAP 


diseases such as Parkinson’s, Alzheimer’s, 
addiction, depression and pain, says team 
leader and electrical engineer, Jeong Jae- 
Woong. Weighing 2 grams, it uses LEGO-like 
replaceable drug cartridges, a probe the 
thickness of a human hair, and powerful, 
low-energy Bluetooth to deliver the drugs 
and light, which turn neurones on or off 
without hurting the rodents. This ‘plug-and- 
play’ interface was the major challenge, 
Jeong says. 

After two years of laboratory and animal 
trials, the proof-of-concept paper was 
published in Nature Biomedical Engineering 
(R. Qazi et al. Nature Biomed. Eng. 3, 655- 
669; 2019) by the team of neuroscientists 
and engineers from electrical, mechanical 
and software backgrounds. Jeong plans to 
commercialize the device and technology. 


Cross-sectoral moves between industry and academia per 1,000 researchers 2017-19 are shown for the top 10 
countries by Share in the Nature Index. South Korea is a global outlier for its high proportion of researchers 
moving from industry to academia. For shifts from academia to industry, it is in the middle of the pack, with a 
slightly higher proportion than Canada and slightly less than the UK. Switzerland then the US have by far the 
greatest share of researchers moving from academia to industry. Bubbles are sized according to research 
scale, a function of the number of research institutions in each country covered by the data set. 
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Upwardly mobile 


With growing numbers of researchers coming and going, 


South Korea’s scientific landscape is becoming more 
diverse, and more productive. By Chris Woolston 


he proportion of academics who have 

relocated to South Korea from other 

countries in the past three years is 

higher than the global average, sug- 

gesting that the country’s drive to 
end its comparative scientific isolation may 
be bearing fruit. 

“It’s becoming a hub for global talent,” says 
Paul McCarthy, chief executive of League of 
Scholars, anacademic recruitment firm based 
in Sydney, Australia. The firm is using publi- 
cation records to track mobility and produc- 
tivity of more than 2 million researchers from 
around the world, including 26,697 research- 
ers in South Korea, more than 95% of whom 
work in science and technology. 

Author affiliations in the League of Scholars 
data set show that 4.3-4.9% of all academics in 
South Korea had moved from abroad between 
2017 and 2020, whether foreign-born research- 
ers relocating from elsewhere, or those return- 
ing home after working abroad. That’s higher 
than the global average of 3.7%. The data on 
these globally mobile researchers indicate that 
the country has recently attracted an influx of 
productive and successful scientists, accord- 
ing to McCarthy. 


High-impact scientists 

Globally mobile researchers in South Korea 
tend to thrive. Researchers who relocated to 
South Korea since 2017 have been found to be 
more than 50% more productive than other 
academics in the country, measured by the 
median number of publications they authored 
or co-authored with at least ten citations inthe 
previous five years. They also have 39% more life- 
time citations than researchers who remained in 
South Korea during the data-collection period. 
In another measure of performance, the median 
annualized h-index of globally mobile research- 
ers in Korea is 21% higher than their colleagues 
who remained at home. 

Researchers who tend to be internationally 
mobile are more productive wherever they 
are found, not just in South Korea, notes Cor- 
nelia Lawson, ascience policy researcher at the 
University of Manchester, UK. 

Lawson says that working abroad gives 
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South Korea has a higher-than-average proportion of academics who have relocated from an overseas institution 
in the last 3 years. KAIST has attracted the biggest share of this valuable globally mobile group: the so-called 
border-crossers have a 21% higher impact (based on annualized h-index) and 50% higher output (based on 
articles over the past 5 years with at least 10 citations) than their stay-at-home colleagues. 
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South Korean researchers important insights 
into what it takes to get published in journals 
with an international reach. South Korean 
researchers who publish largely or exclusively 
in domestically published journals tend to 
miss out on the exposure and citations that 
come with higher-impact journals, she says. 
“A number of countries in Asia have a strong 
local publication culture,” she says. “Domestic 
scientists may not understand or feel the need 
to publish internationally.” 

When it comes to sheer number of publi- 
cations, however, home-grown researchers 
in South Korea can compare favourably with, 
or outperform, their colleagues from else- 
where. A 2014 study’ published in the journal 
Minerva found that researchers with domes- 
tic degrees in South Korea, Hong Kong and 
Malaysia published more refereed papers and 
books or book chapters inthe natural sciences, 
engineering and biomedical sciences than 
researchers holding foreign degrees. Many 
of those papers were in domestic journals, 
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where native researchers would have a clear 
publication advantage. 

For Moon Kee Choi, a bioengineer who 
works on wearable and implantable electronic 
devices at the Ulsan National Institute of Sci- 
ence and Technology, the monthly ‘happy 
hour’ at the University of California, Berkeley 
(UC Berkeley), where she was a postdoctoral 
researcher from 2017 to 2019, was a career 
boon. Bioengineers exchanged ideas while 
buying snacks and drinks. “Many people in 
South Korea collaborate with others, but until 
now, that collaboration is largely domestic,” 
she says. “Doing an international postdoc isa 
good way to make connections.” 

‘Boomerang’ researchers suchas Choi, who 
return home to South Korea, often have expe- 
rience with both domestic and international 
publishing, and benefit from broader networks 
and more opportunities for international 
collaboration, she says. 

For Choi, those collaborations have trans- 
latedinto publications. In 2019, she co-authored 
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KNOWLEDGE TRANSFERS 


Destinations of researchers who have relocated from South Korea since 2017. The US, which is South Korea’s top collaborating country on articles in the Nature 
Index journals, is also the favoured location of its research diaspora, landing 575 recruits in the past 3 years. China and Japan are respectively 2nd and 3rd among 
South Korea’s research collaborators, yet have received comparatively few of its researchers over the period. 
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a paper’ on molybdenum sulfide capsules for 
cellular microscopy with three UC Berkeley 
researchers. Sanghyuk Wooh, a materials scien- 
tist who studies surfaces at Chung-Ang Univer- 
sity in Seoul, suggests that intense competition 
that rewards higher achievers is another reason 
why Korean ‘boomerang’ researchers may be 
especially productive and impactful. Wooh 
completed a postdoc at the Max Planck Insti- 
tute for Polymer Research in Mainz, Germany, 
before returning to South Koreain 2017. South 
Korean PhD graduates going onto postdoctoral 
positions in foreign countries tread a well- 
travelled path, but the return trip is more chal- 
lenging, he says. “Most of the researchers who 
go overseas want tocome back to South Korea,” 
he says. “But there are a limited number of fac- 
ulty positions, so it’s very competitive.” 


Reaching out 

The path may be somewhat easier for foreign- 
ers. The government has made a concerted 
push to attract and retain international 
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talentin recent years, including several special 
initiatives to hire overseas researchers. One 
example is the World Class University Project, 
a programme that started targeting high- 
profile international researchers in 2008. 
International researchers hired through 
such schemes often have a relatively easy 
path to promotion, says Stephanie Kim, a 
higher-education researcher at Georgetown 
University in Washington DC, who authored 
a 2016 study’ of Western academics at South 
Korean universities. They also enjoy fringe 
benefits, including, in some cases, free hous- 
ing. On the downside, they are not always 
well integrated into the rest of the university. 
“They have some advantages, but they are also 
marginalized and sequestered,” says Kim. 
Still, she adds, the resources and support 
can make South Korea an especially reward- 
ing place to do research. “Unlike the United 
States, where funding for higher education 
is always dwindling, there’s actually money 
being poured into the higher-education 
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sector in South Korea,” says Kim. 

Although the League of Scholars data 
suggest that South Korea is making headway in 
an attempt to internationalize, Lawson notes 
thatthe country still lags behind other countries 
interms of the diversity of its research faculty. 
According to a2019 report by the Organisation 
for Economic Co-operation and Development, 
only 6.3% of research faculty at South Korean 
universities in 2017 were foreign-born, aslight 
drop from previous years. By comparison, 
28% of full-time science and engineering fac- 
ulty in the United States were born elsewhere, 
according toa 2018 report fromthe US National 
Science Foundation. 


Chris Woolston is a freelance writer in 
Billings, Montana. 
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